AI Video Tools

Explore the best AI Video Tools — independent reviews, comparisons, pricing and step-by-step how-to guides, curated by Aizhi.

  • SciPy

    SciPy

    SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier transform, signal and image processing, ordinary differential equation solvers and other tasks common in science and engineering. SciPy is also a family of conferences for users and developers of these tools: SciPy (in the United States), EuroSciPy (in Europe) and SciPy.in (in India). Enthought originated the SciPy conference in the United States and continues to sponsor many of the international conferences as well as host the SciPy website. The SciPy library is currently distributed under the BSD license, and its development is sponsored and supported by an open community of developers. It is also supported by NumFOCUS, a community foundation for supporting reproducible and accessible science. == Components == The SciPy package is at the core of Python's scientific computing capabilities. Available sub-packages include: cluster: hierarchical clustering, vector quantization, K-means constants: physical constants and conversion factors datasets: various example datasets for demonstrating image and data processing differentiate: numerical differentiation for first and second derivatives fft: Discrete Fourier Transform algorithms fftpack: Legacy interface for Discrete Fourier Transforms integrate: numerical integration routines interpolate: interpolation tools io: data input and output, including support for MATLAB and Matrix Market files linalg: linear algebra routines ndimage: various functions for multi-dimensional image processing odr: orthogonal distance regression classes and algorithms optimize: optimization algorithms including linear programming and a variety of numerical nonlinear programming optimizers signal: signal processing tools sparse: sparse matrices and related algorithms spatial: algorithms for spatial structures such as k-d trees, nearest neighbors, convex hulls, etc. special: special functions stats: statistical functions == Data structures == The basic data structure used by SciPy is a multidimensional array provided by the NumPy module. NumPy provides some functions for linear algebra, Fourier transforms, and random number generation, but not with the generality of the equivalent functions in SciPy. NumPy can also be used as an efficient multidimensional container of data with arbitrary datatypes. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. Older versions of SciPy used Numeric as an array type, which is now deprecated in favor of the newer NumPy array code. == History == In the 1990s, Python was extended to include an array type for numerical computing called Numeric. (This package was eventually replaced by NumPy, which was written by Travis Oliphant in 2006 as a blending of Numeric and Numarray, with Numarray itself being started in 2001.) As of 2000, there was a growing number of extension modules and increasing interest in creating a complete environment for scientific and technical computing. In 2001, Travis Oliphant, Eric Jones, and Pearu Peterson merged code they had written and called the resulting package SciPy. The newly created package provided a standard collection of common numerical operations on top of the Numeric array data structure. Shortly thereafter, Fernando Pérez released IPython, an enhanced interactive shell widely used in the technical computing community, and John Hunter released the first version of Matplotlib, the 2D plotting library for technical computing. Since then the SciPy environment has continued to grow with more packages and tools for technical computing. == Scientific Python versus ScientificPython == In the scientific literature, SciPy is occasionally referred to as "Scientific Python (SciPy)". This is incorrect: the official name of the project is just "SciPy". Furthermore, expanding "SciPy" as "Scientific Python" may cause confusion with "ScientificPython", a project led by Konrad Hinsen of Orléans University that was active between 1995 and 2014. "Scientific Python" is also used for the related ecosystem of tools.

    Read more →
  • Weibo

    Weibo

    Weibo (Chinese: 微博; pinyin: Wēibó), or Sina Weibo (Chinese: 新浪微博; pinyin: Xīnlàng Wēibó), is a Chinese microblogging (weibo) website. Launched by Sina Corporation on 14 August 2009, it is one of the biggest social media platforms in China, with over 582 million monthly active users (252 million daily active users) as of Q1 2022. The platform has been highly successful but has faced criticism for heavy censorship. Sina had gone public on the Nasdaq in 2000. In March 2014, Sina announced a spinoff of Weibo and filed an IPO under the symbol WB. Sina carved out 11% of Weibo in the IPO, with Alibaba owning 32% post-IPO. The company began trading publicly on 17 April 2014. In March 2017, Sina launched Sina Weibo International Version. In November 2018, Sina Weibo suspended its registration function for minors under the age of 14. In July 2019, Sina Weibo announced that it would launch a two-month campaign to clean up pornographic and vulgar information, named "Project Deep Blue" (蔚蓝计划). On 29 September 2020, the company announced it would go private again due to rising tensions between the US and China. == Name == "Weibo" (微博) is the Chinese word for "microblog". Sina Weibo launched its new domain name weibo.com on 7 April 2011, deactivating and redirecting from the old domain, t.sina.com.cn, to the new one. Due to its popularity, the media sometimes refers to the platform simply as "Weibo", despite the numerous other Chinese microblogging services including Tencent Weibo, Sohu Weibo, and NetEase Weibo. However, the latter three have stopped providing services. == Background == Sina Weibo is a platform based on fostering user relationships to share, disseminate, and receive information. Through the website or the mobile app, users can upload pictures and videos publicly for instant sharing, with other users being able to comment with text, pictures and videos, or use a multimedia instant messaging service. The company initially invited a large number of celebrities to join the platform at the beginning and has since invited many media personalities, government departments, businesses and non-governmental organizations to open accounts for the purpose of publishing and communicating information. To avoid the impersonation of celebrities, Sina Weibo uses verification symbols; celebrity accounts have an orange letter "V" and organizations' accounts have a blue letter "V". Sina Weibo has more than 500 million registered users; out of these, 313 million are monthly active users, 85% use the Weibo mobile app, 70% are college-aged, 50.10% are male and 49.90% are female. There are over 100 million messages posted by users each day. With more than 100 million followers, actress Xie Na holds the record for the most followers on the platform. Despite fierce competition among Chinese social media platforms, Sina Weibo remains the most popular. == History == After the July 2009 Ürümqi riots, China shut down most domestic microblogging services, including Fanfou, the very first weibo service. Many popular non-China-based microblogging services like Twitter, Facebook, and Plurk have since been blocked. Sina Corporation CEO Charles Chao considered this to be an opportunity, and on 14 August 2009, Sina launched the tested version of Sina Weibo. Basic functions including message, private message, comment and reposting were made available that September. A Sina Weibo–compatible API platform for developing third-party applications was launched on 28 July 2010. On 1 December 2010, the website experienced an outage, which administrators later said was due to the ever-increasing numbers of users and posts. Registered users surpassed 100 million in February 2011. Since 23 March 2011, t.cn has been used as Sina Weibo's official shortened URL in lieu of sinaurl.cn. On 7 April 2011, weibo.com replaced t.sina.com.cn as the new main domain name used by the website. The official logo was also updated. In June 2011, Sina announced an English-language version of Sina Weibo would be developed and launched, though content would still be governed by Chinese law. On 11 January 2013, Sina Weibo and Alibaba China (a subsidiary of Alibaba Group) signed a strategic cooperation agreement. With more and more foreign celebrities using Sina Weibo, language translation has become an urgent need for Chinese users who wish to communicate with their idols online, especially Korean. In January 2013, Sina Weibo and NetEase.com announced that they had reached a strategic cooperation agreement. When users browse foreign language content, they can now directly obtain translation results through the YouDao Dictionary. The Sina Weibo financial report in February 2013 showed that its total revenue was approximately US$66 million and that the number of registered users had exceeded the 500 million mark. In April 2013, Sina officially announced that Sina Weibo had signed a strategic cooperation agreement with Alibaba. The two sides conducted in-depth cooperation in areas such as user account interoperability, data exchange, online payment, and internet marketing. At the same time, Sina announced that Alibaba, through its wholly owned subsidiary, had purchased the preferred shares and common shares issued by Sina Weibo Company for US$586 million, which accounted for approximately 18% of Weibo's fully diluted and diluted total shares. === Ownership === On 9 April 2013, Alibaba Group announced that it would acquire 18% of Sina Weibo for US$586 million, with the option to buy up to 30% in the future. Alibaba exercised this option when Weibo was listed on the NASDAQ in April 2014. == Users == According to iResearch's report on 30 March 2011, Sina Weibo had 56.5% of China's microblogging market based on active users and 86.6% based on browsing time over competitors such as Tencent Weibo and Baidu. According to research by Sina Corporation, the number of active users reached over 400 million by Q1 2018, making Sina Weibo the 7th platform with at least 400 million active users, and daily usage increased by 21%. As of 2017, approximately 80% of its users were in their 20s and 30s. The top 100 users had over 485 million followers combined. More than 5,000 companies and 2,700 media organizations in China use Sina Weibo. The site is maintained by a growing microblogging department of 200 employees responsible for technology, design, operations, and marketing. Sina executives invited and persuaded many Chinese celebrities to join the platform. Users now include Asian celebrities, movie stars, singers, famous business and media figures, athletes, scholars, artists, organizations, religious figures, government departments, and officials from Hong Kong, Mainland China, Malaysia, Singapore, Taiwan, and Macau, as well as some famous foreign individuals and organizations, including Kevin Rudd, Boris Johnson, David Cameron, Narendra Modi, Toshiba, and the Germany national football team. Sina Weibo has a verification program for known people and organizations. Once an account is verified, a verification badge is added beside the account name. == Features == Many of Sina Weibo's features resemble those of Twitter. A user may post with a 140-character limit (increased to 2,000 as of January 2016 with the exception of reposts and comments). An analysis of 29 million Weibo posts found the median length was 14 characters. Users may mention or talk to other people using "@UserName" formatting, add hashtags, follow other users to make their posts appear in one's own timeline, re-post with "//@UserName" similar to Twitter's retweet function "RT @UserName", select posts for one's favorites list, and verify the account if the user is a celebrity, brand, business or otherwise of public interest. URLs are automatically shortened using the domain name t.cn, akin to Twitter's t.co. Official and third-party applications can access Sina Weibo from other websites or platforms. Users may: Submit up to 18 images/video files in every post Send personal messages to followers Follow others and be followed Post "stories" like on Instagram React to posts using different emojis Receive monetary rewards that can be used in a digital store linked to Weibo View posts identified as "hot" or popular Display the location they post from Hashtags differ slightly between Sina Weibo and Twitter, using the double-hashtag "#HashName#" format (the lack of spacing between Chinese characters necessitates a closing tag). Users can own a hashtag by requesting hashtag monitoring; the company reviews these requests and responds within one to three days. Once a user owns a hashtag, they have access to a wide variety of functions available only to them on the condition that they remain active (less than 1 post per calendar week revokes these privileges). Additionally, comments appear as a list below each post. A commenter can also choose to re-post the comment, quoting the whole original post, to their own page. Unregistered users can only browse a few post

    Read more →
  • International Medical Education Directory

    International Medical Education Directory

    The International Medical Education Directory (IMED) was a public database of worldwide medical schools. The IMED was published as a joint collaboration of the Educational Commission for Foreign Medical Graduates (ECFMG) and the Foundation for Advancement of International Medical Education and Research (FAIMER). The information available in IMED was derived from data collected by the Educational Commission for Foreign Medical Graduates (ECFMG) throughout its history of evaluating the medical education credentials of international medical graduates. Using these data as a starting point, Foundation for Advancement of International Medical Education and Research (FAIMER) began developing IMED in 2001 and made it publicly available in April 2002. In April 2014, IMED was merged with the Avicenna Directory to create the World Directory of Medical Schools. The World Directory is now the definitive list of medical schools in the world, as IMED and Avicenna were discontinued in 2015.

    Read more →
  • Pwnie Awards

    Pwnie Awards

    The Pwnie Awards are an annual awards ceremony that recognizes both excellence and incompetence in the field of information security, described by SecurityWeek as an event that "recognizes excellence and mocks incompetence in cybersecurity." Winners are selected by a committee of security industry professionals from nominations collected from the information security community. Nominees are announced yearly at Summercon, and the awards themselves are presented at the Black Hat Security Conference. == Origins == The name Pwnie Award is based on the word "pwn", which is hacker slang meaning to "compromise" or "control" based on the previous usage of the word "own" (and it is pronounced similarly). The name "The Pwnie Awards," pronounced as "Pony," is meant to sound like the Tony Awards, an awards ceremony for Broadway theater in New York City. == History == The Pwnie Awards were founded in 2007 by Alexander Sotirov and Dino Dai Zovi following discussions regarding Dino's discovery of a cross-platform QuickTime vulnerability (CVE-2007-2175) and Alexander's discovery of an ANI file processing vulnerability (CVE-2007-0038) in Internet Explorer. == Winners == === 2024 === Most Epic Fail: Crowdstrike for 2024 CrowdStrike incident Best Mobile Bug: Operation Triangulation Lamest Vendor Response: Xiaomi for obstructing Pwn2Own researchers from using their services Best Cryptographic Attack: GoFetch Best Desktop Bug: forcing realtime WebAudio playback in Chrome (CVE-2023-5996) Best Song: Touch Some Grass by UwU Underground Best Privilege Escalation: Windows Streaming Service UAF (CVE-2024-30089) by Valentina Palmiotti (chompie) Best Remote Code Execution: Microsoft Message Queuing (MSMQ) Remote Code Execution Vulnerability (CVE-2024-30080) Most Epic Achievement: Discovery and reverse engineering of the XZ Utils backdoor Most Innovative Research: Let the Cache Cache and Let the WebAssembly Assemble: Knocking’ on Chrome’s Shell by Edouard Bochin, Tao Yan, and Bo Qu Most Underhyped Research: See No Eval: Runtime Dynamic Code Execution in Objective-C === 2023 === Best Desktop Bug: CountExposure! by RyeLv(@b2ahex) Best Cryptographic Attack: Video-based cryptanalysis: Extracting Cryptographic Keys from Video Footage of a Device’s Power LED by Ben Nassi, Etay Iluz, Or Cohen, Ofek Vayner, Dudi Nassi, Boris Zadov, Yuval Elovici Best Song: Clickin’ Most Innovative Research: Inside Apple’s Lightning: Jtagging the iPhone for Fuzzing and Profit Most Under-Hyped Research: Activation Context Cache Poisoning Best Privilege Escalation Bug: URB Excalibur: Slicing Through the Gordian Knot of VMware VM Escapes Best Remote Code Execution Bug: ClamAV RCE Lamest Vendor Response: Three Lessons From Threema: Analysis of a Secure Messenger Most Epic Fail: “Holy fucking bingle, we have the no fly list,” Epic Achievement: Clement Lecigne: 0-days hunter world champion Lifetime Achievement Award: Mudge === 2022 === Lamest Vendor Response: Google's "TAG" response team for "unilaterally shutting down a counterterrorism operation." Epic Achievement: Yuki Chen’s Windows Server-Side RCE Bugs Most Epic Fail: HackerOne Employee Caught Stealing Vulnerability Reports for Personal Gains Best Desktop Bug: Pietro Borrello, Andreas Kogler, Martin Schwarzl, Moritz Lipp, Daniel Gruss, Michael Schwarz for Architecturally Leaking Data from the Microarchitecture Most Innovative Research: Pietro Borrello, Martin Schwarzl, Moritz Lipp, Daniel Gruss, Michael Schwarz for Custom Processing Unit: Tracing and Patching Intel Atom Microcode Best Cryptographic Attack: Hertzbleed: Turning Power Side-Channel Attacks Into Remote Timing Attacks on x86 by Yingchen Wang, Riccardo Paccagnella, Elizabeth Tang He, Hovav Shacham, Christopher Fletcher, David Kohlbrenner Best Remote Code Execution Bug: KunlunLab for Windows RPC Runtime Remote Code Execution (CVE-2022-26809) Best Privilege Escalation Bug: Qidan He of Dawnslab, for Mystique in the House: The Droid Vulnerability Chain That Owns All Your Userspace Best Mobile Bug: FORCEDENTRY Most Under-Hyped Research: Yannay Livneh for Spoofing IP with IPIP Best Song: Dialed Up by Project Mammoth === 2021 === Lamest Vendor Response: Cellebrite, for their response to Moxie, the creator of Signal, reverse-engineering their UFED and accompanying software and reporting a discovered exploit. Epic Achievement: Ilfak Guilfanov, in honor of IDA's 30th Anniversary. Best Privilege Escalation Bug: Baron Samedit of Qualys, for the discovery of a 10-year-old exploit in sudo. Best Song: The Ransomware Song by Forrest Brazeal Best Server-Side Bug: Orange Tsai, for his Microsoft Exchange Server ProxyLogon attack surface discoveries. Best Cryptographic Attack: The NSA for its disclosure of a bug in the verification of signatures in Windows which breaks the certificate trust chain. Most Innovative Research: Enes Göktaş, Kaveh Razavi, Georgios Portokalidis, Herbert Bos, and Cristiano Giuffrida at VUSec for their research on the "BlindSide" Attack. Most Epic Fail: Microsoft, for their failure to fix PrintNightmare. Best Client-Side Bug: Gunnar Alendal's discovery of a buffer overflow on the Samsung Galaxy S20's secure chip. Most Under-Hyped Research: The Qualys Research Team for 21Nails, 21 vulnerabilities in Exim, the Internet's most popular mail server. === 2020 === Best Server-Side Bug: BraveStarr (CVE-2020-10188) – A Fedora 31 netkit telnetd remote exploit (Ronald Huizer') Best Privilege Escalation Bug: checkm8 – A permanent unpatchable USB bootrom exploit for a billion iOS devices. (axi0mX) Epic Achievement: "Remotely Rooting Modern Android Devices" (Guang Gong) Best Cryptographic Attack: Zerologon vulnerability (Tom Tervoort, CVE-2020-1472) Best Client-Side Bug: RCE on Samsung Phones via MMS (CVE-2020-8899 and -16747), a zero click remote execution attack. (Mateusz Jurczyk) Most Under-Hyped Research: Vulnerabilities in System Management Mode (SMM) and Trusted Execution Technology (TXT) (CVE-2019-0151 and -0152) (Gabriel Negreira Barbosa, Rodrigo Rubira Branco, Joe Cihula) Most Innovative Research: TRRespass: When Memory Vendors Tell You Their Chips Are Rowhammer-free, They Are Not. (Pietro Frigo, Emanuele Vannacci, Hasan Hassan, Victor van der Veen, Onur Mutlu, Cristiano Giuffrida, Herbert Bos, Kaveh Razavi) Most Epic Fail: Microsoft; for the implementation of Elliptic-curve signatures which allowed attackers to generate private pairs for public keys of any signer, allowing HTTPS and signed binary spoofing. (CVE-2020-0601) Best Song: Powertrace by Rebekka Aigner, Daniel Gruss, Manuel Weber, Moritz Lipp, Patrick Radkohl, Andreas Kogler, Maria Eichlseder, ElTonno, tunefish, Yuki and Kater Lamest Vendor Response: Daniel J. Bernstein (CVE-2005-1513) === 2019 === Best Server-Side Bug: Orange Tsai and Meh Chang, for their SSL VPN research. Most Innovative Research: Vectorized Emulation Brandon Falk Best Cryptographic Attack: \m/ Dr4g0nbl00d \m/ Mathy Vanhoef, Eyal Ronen Lamest Vendor Response: Bitfi Most Over-hyped Bug: Allegations of Supermicro hardware backdoors, Bloomberg Most Under-hyped Bug: Thrangrycat, (Jatin Kataria, Red Balloon Security) === 2018 === Most Innovative Research: Spectre/Meltdown (Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, Yuval Yarom) Best Privilege Escalation Bug: Spectre/Meltdown (Paul Kocher, Jann Horn, Anders Fogh, Daniel Genkin, Daniel Gruss, Werner Haas, Mike Hamburg, Moritz Lipp, Stefan Mangard, Thomas Prescher, Michael Schwarz, Yuval Yarom) Lifetime Achievement: Michał Zalewski Best Cryptographic Attack: ROBOT - Return Of Bleichenbacher’s Oracle Threat Hanno Böck, Juraj Somorovsky, Craig Young Lamest Vendor Response: Bitfi hardware crypto-wallet, after the "unhackable" device was hacked to extract the keys required to steal coins and rooted to play Doom. === 2017 === Epic Achievement: Federico Bento for Finally getting TIOCSTI ioctl attack fixed Most Innovative Research: ASLR on the line Ben Gras, Kaveh Razavi, Erik Bosman, Herbert Bos, Cristiano Giuffrida Best Privilege Escalation Bug: DRAMMER Victor van der Veen, Yanick Fratantonio, Martina Lindorfer, Daniel Gruss, Clementine Maurice, Giovanni Vigna, Herbert Bos, Kaveh Razavi, Cristiano Giuffrida Best Cryptographic Attack: The first collision for full SHA-1 Marc Stevens, Elie Bursztein, Pierre Karpman, Ange Albertini, Yarik Markov Lamest Vendor Response: Lennart Poettering - for mishandling security vulnerabilities most spectacularly for multiple critical Systemd bugs Best Song: Hello (From the Other Side) - Manuel Weber, Michael Schwarz, Daniel Gruss, Moritz Lipp, Rebekka Aigner === 2016 === Most Innovative Research: Dedup Est Machina: Memory Deduplication as an Advanced Exploitation Vector Erik Bosman, Kaveh Razavi, Herbert Bos, Cristiano Giuffrida Lifetime Achievement: Peiter Zatko aka Mudge Best Cryptographic Attack: DROWN attack Nimrod Aviram et al. Best Song: Cyberlier - Katie Mous

    Read more →
  • Text normalization

    Text normalization

    Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure. == Applications == Text normalization is frequently used when converting text to speech. Numbers, dates, acronyms, and abbreviations are non-standard "words" that need to be pronounced differently depending on context. For example: "$200" would be pronounced as "two hundred dollars" in English, but as "lua selau tālā" in Samoan. "vi" could be pronounced as "vie," "vee," or "the sixth" depending on the surrounding words. Text can also be normalized for storing and searching in a database. For instance, if a search for "resume" is to match the word "résumé," then the text would be normalized by removing diacritical marks; and if "john" is to match "John", the text would be converted to a single case. To prepare text for searching, it might also be stemmed (e.g. converting "flew" and "flying" both into "fly"), canonicalized (e.g. consistently using American or British English spelling), or have stop words removed. == Techniques == For simple, context-independent normalization, such as removing non-alphanumeric characters or diacritical marks, regular expressions would suffice. For example, the sed script sed ‑e "s/\s+/ /g" inputfile would normalize runs of whitespace characters into a single space. More complex normalization requires correspondingly complicated algorithms, including domain knowledge of the language and vocabulary being normalized. Among other approaches, text normalization has been modeled as a problem of tokenizing and tagging streams of text and as a special case of machine translation. == Textual scholarship == In the field of textual scholarship and the editing of historic texts, the term "normalization" implies a degree of modernization and standardization – for example in the extension of scribal abbreviations and the transliteration of the archaic glyphs typically found in manuscript and early printed sources. A normalized edition is therefore distinguished from a diplomatic edition (or semi-diplomatic edition), in which some attempt is made to preserve these features. The aim is to strike an appropriate balance between, on the one hand, rigorous fidelity to the source text (including, for example, the preservation of enigmatic and ambiguous elements); and, on the other, producing a new text that will be comprehensible and accessible to the modern reader. The extent of normalization is therefore at the discretion of the editor, and will vary. Some editors, for example, choose to modernize archaic spellings and punctuation, but others do not. An edition of a text might be normalized based on internal criteria, where orthography is standardized according to the language of the original, or external criteria, where the norms of a different time period are applied. For an example of the latter, a published edition of a medieval Icelandic manuscript might be normalized to the conventions of modern Icelandic, or it might be normalized to Classical Old Icelandic. Standards of normalization vary based on language of the edition as well as the specific conventions of the publisher.

    Read more →
  • BeyondCorp

    BeyondCorp

    BeyondCorp is an implementation of zero-trust computer security concepts creating a zero trust network. It is created by Google. == Background == It was created in response to the 2009 Operation Aurora. An open source implementation inspired by Google's research paper on an access proxy is known as "transcend". Google documented its Zero Trust journey from 2014 to 2018 through a series of articles in the journal ;login:. Google called their ZT network "BeyondCorp". Google implemented a Zero Trust architecture on a large scale, and relied on user and device credentials, regardless of location. Data was encrypted and protected from managed devices. Unmanaged devices, such as BYOD, were not given access to the BeyondCorp resources. == Design and technology == BeyondCorp utilized a zero trust security model, which is a relatively new security model that it assumes that all devices and users are potentially compromised. This is in contrast to traditional security models, which rely on firewalls and other perimeter defenses to protect sensitive data. === Trust === The corporate network grants no inherent trust, and all internal apps are accessed via the BeyondCorp system, regardless of whether the user is in a Google office or working remotely. BeyondCorp is related to Zero Trust architecture as it implements a true Zero Trust network, where all access is granted on identity, device, and authentication, based on robust underlying device and identity data sources. BeyondCorp works by using a number of security policies including authentication, authorization, and access control to ensure that only authorized users can access corporate resources. Authentication verifies the identity of the user, authorization determines whether the user has permission to access the requested resource, and access control policies restrict what the user can do with the resource. ==== Trust Inferrer ==== One of the main components in BeyondCorp's implementation is the Trust Inferrer. The Trust Inferrer is a security component (typically software) that looks at information about a user's device, like a computer or phone, to decide how much it can be trusted to access certain resources like important company documents. The Trust Inferrer checks things like the security of the device, whether it has the right software installed, and if it belongs to an authorized user. Based on all this information, the Trust Inferrer decides what the device can access and what it can't. === Security mechanisms === Unlike traditional VPNs, BeyondCorp's access policies are based on information about a device, its state, and its associated user. BeyondCorp considers both internal networks and external networks to be completely untrusted, and gates access to applications by dynamically asserting and enforcing levels, or “tiers,” of access. === Device Inventory Database === BeyondCorp utilized a Device Inventory Database and Device Identity that uniquely identifies a device through a digital certificate. Any changes to the device are recorded in the Device Inventory Database. The certificate is used to uniquely identify a device; however, additional information is required to grant access privileges to a resource. === Access Control Engine === Another important component of BeyondCorp's implementation is the Access Control Engine. Think of this as the brain of the Zero Trust architecture. The Access Control Engine is like a traffic cop standing at an intersection. Its job is to make sure that only authorized devices and users are allowed to access specific resources (like files or applications) on the network. It checks the access policy (the rules that say who can access what), the device's state (like whether it has the right software updates or security settings), and the resources being requested. Then it makes a decision on whether to grant or deny access based on all of this information. It helps ensure that only the right people and devices are allowed access to the network, which helps keep things secure. The Access Control Engine utilizes the output from the Trust Inferrer and other data that is fed into its system. == Usage == One of the first things Google did to implement a Zero Trust architecture was to capture and analyze network traffic. The purpose of analyzing the traffic was to build a baseline of what typical network traffic looked like. In doing so, BeyondCorp also discovered unusual, unexpected, and unauthorized traffic. This was very useful because it gave the BeyondCorp engineers critical information that assisted them in reengineering the system in a secure manner. Some of the benefits BeyondCorp realized by adopting a Zero Trust architecture include the ability to allow their employees to work securely from any location. It reduces the risk of data breaches since data and applications are protected and users and devices are constantly being verified. The Zero Trust architecture is scalable and can be adapted to the changing needs of the businesses and their users. Especially relevant in today's work-from-home era, BeyondCorp allows employees to access enterprise resources securely from any location, without the need for traditional VPNs.

    Read more →
  • Patent visualisation

    Patent visualisation

    Patent visualisation is an application of information visualisation. The number of patents has been increasing, encouraging companies to consider intellectual property as a part of their strategy. Patent visualisation, like patent mapping, is used to quickly view a patent portfolio. Software dedicated to patent visualisation began to appear in 2000, for example Aureka from Aurigin (now owned by Thomson Reuters). Many patent and portfolio analytics platforms, such as Questel, Patent Forecast, PatSnap, Patentcloud, Relecura, and Patent iNSIGHT Pro, offer options to visualise specific data within patent documents by creating topic maps, priority maps, IP Landscape reports, etc. Software converts patents into infographics or maps, to allow the analyst to "get insight into the data" and draw conclusions. Also called patinformatics, it is the "science of analysing patent information to discover relationships and trends that would be difficult to see when working with patent documents on a one-and-one basis". Patents contain structured data (like publication numbers) and unstructured text (like title, abstract, claims and visual info). Structured data are processed by data-mining and unstructured data are processed with text-mining. == Data mining == The main step in processing structured information is data-mining, which emerged in the late 1980s. Data mining involves statistics, artificial intelligence, and machine learning. Patent data mining extracts information from the structured data of the patent document. These structured data are bibliographic fields such as location, date or status. === Structured fields === === Advantages === Data mining allows study of filing patterns of competitors and locates main patent filers within a specific area of technology. This approach can be helpful to monitor competitors' environments, moves and innovation trends and gives a macro view of a technology status. == Text-mining == === Principle === Text mining is used to search through unstructured text documents. This technique is widely used on the Internet, it has had success in bioinformatics and now in the intellectual property environment. Text mining is based on a statistical analysis of word recurrence in a corpus. An algorithm extracts words and expressions from title, summary and claims and gathers them by declension. "And" and "if" are labeled as non-information bearing words and are stored in the stopword list. Stoplists can be specialised in order to create an accurate analysis. Next, the algorithm ranks the words by weight, according to their frequency in the patent's corpus and the document frequency containing this word. The score for each word is calculated using a formula such as: W e i g h t = T e r m F r e q u e n c y D o c u m e n t F r e q u e n c y = F r e q u e n c y o f t h e w o r d o r e x p r e s s i o n i n t h e T e x t S e a N u m b e r o f d o c u m e n t s c o n t a i n i n g t h e e x p r e s s i o n o r w o r d {\displaystyle Weight={\frac {Term\ Frequency}{Document\ Frequency}}={\frac {Frequency\ of\ the\ word\ or\ expression\ in\ the\ Text\ Sea}{Number\ of\ documents\ containing\ the\ expression\ or\ word}}} A frequently used word in several documents has less weight than a word used frequently in a few patents. Words under a minimum weight are eliminated, leaving a list of pertinent words or descriptors. Each patent is associated to the descriptors found in the selected document. Further, in the process of clusterisation, these descriptors are used as subsets, in which the patent are regrouped or as tags to place the patents in predetermined categories, for example keywords from International Patent Classifications. Four text parts can be processed with text-mining : Title Abstract Claim Patent Full-Text Software offer different combinations but title, abstract and claim are generally the most used, providing a good balance between interferences and relevancy. === Advantages === Text-mining can be used to narrow a search or quickly evaluate a patent corpus. For instance, if a query produces irrelevant documents, a multi-level clustering hierarchy identifies them in order to delete them and refine the search. Text-mining can also be used to create internal taxonomies specific to a corpus for possible mapping. == Visualisations == Allying patent analysis and informatic tools offers an overview of the environment through value-added visualisations. As patents contain structured and unstructured information, visualisations fall in two categories. Structured data can be rendered with data mining in macrothematic maps and statistical analysis. Unstructured information can be shown in like clouds, cluster maps and 2D keyword maps. === Data mining visualisation === === Text mining visualisation === === Visualisation for both data-mining and text-mining === Mapping visualisations can be used for both text-mining and data-mining results. == Uses == What patent visualisation can highlight: Competitors Partners New innovations Technologic environment description Networks Field application: R&D strategy management Competitive intelligence Licensing Strategy

    Read more →
  • Glyph (data visualization)

    Glyph (data visualization)

    In the context of data visualization, a glyph is any marker, such as an arrow or similar marking, used to specify part of a visualization. This is a representation to visualize data where the data set is presented as a collection of visual objects. These visual objects are collectively called a glyph. It helps visualizing data relation in data analysis, statistics, etc. by using any custom notation. In the context of data visualization, a glyph is the visual representation of a piece of data where the attributes of a graphical entity are dictated by one or more attributes of a data record. == Constructing glyphs == Glyph construction can be a complex process when there are many dimensions to be represented in the visualization. Maguire et al proposed a taxonomy based approach to glyph-design that uses a tree to guide the visual encodings used to representation various data items. Duffy et al created perhaps one of the most complex glyph representations with their representation of sperm movement.

    Read more →
  • Model compression

    Model compression

    Model compression is a machine learning technique for reducing the size of trained models. Large models can achieve high accuracy, but often at the cost of significant resource requirements. Compression techniques aim to compress models without significant performance reduction. Smaller models require less storage space, and consume less memory and compute during inference. Compressed models enable deployment on resource-constrained devices such as smartphones, embedded systems, edge computing devices, and consumer electronics computers. Efficient inference is also valuable for large corporations that serve large model inference over an API, allowing them to reduce computational costs and improve response times for users. Model compression is not to be confused with knowledge distillation, in which a smaller "student" model is trained to imitate the input-output behavior of a larger "teacher" model (as opposed to using the "teacher"'s trained parameters or the "teacher"'s training targets). == Techniques == Several techniques are employed for model compression. === Pruning === Pruning sparsifies a large model by setting some parameters to exactly zero. This effectively reduces the number of parameters. This allows the use of sparse matrix operations, which are faster than dense matrix operations. Pruning criteria can be based on magnitudes of parameters, the statistical pattern of neural activations, Hessian values, etc. === Quantization === Quantization reduces the numerical precision of weights and activations. For example, instead of storing weights as 32-bit floating-point numbers, they can be represented using 8-bit integers. Low-precision parameters take up less space, and takes less compute to perform arithmetic with. It is also possible to quantize some parameters more aggressively than others, so for example, a less important parameter can have 8-bit precision while another, more important parameter, can have 16-bit precision. Inference with such models requires mixed-precision arithmetic. Quantized models can also be used during training (rather than after training). PyTorch implements automatic mixed-precision (AMP), which performs autocasting, gradient scaling, and loss scaling. === Low-rank factorization === Weight matrices can be approximated by low-rank matrices. Let W {\displaystyle W} be a weight matrix of shape m × n {\displaystyle m\times n} . A low-rank approximation is W ≈ U V T {\displaystyle W\approx UV^{T}} , where U {\displaystyle U} and V {\displaystyle V} are matrices of shapes m × k , n × k {\displaystyle m\times k,n\times k} . When k {\displaystyle k} is small, this both reduces the number of parameters needed to represent W {\displaystyle W} approximately, and accelerates matrix multiplication by W {\displaystyle W} . Low-rank approximations can be found by singular value decomposition (SVD). The choice of rank for each weight matrix is a hyperparameter, and jointly optimized as a mixed discrete-continuous optimization problem. The rank of weight matrices may also be pruned after training, taking into account the effect of activation functions like ReLU on the implicit rank of the weight matrices. == Training == Model compression may be decoupled from training, that is, a model is first trained without regard for how it might be compressed, then it is compressed. However, it may also be combined with training. The "train big, then compress" method trains a large model for a small number of training steps (less than it would be if it were trained to convergence), then heavily compress the model. It is found that at the same compute budget, this method results in a better model than lightly compressed, small models. In Deep Compression, the compression has three steps. First loop (pruning): prune all weights lower than a threshold, then finetune the network, then prune again, etc. Second loop (quantization): cluster weights, then enforce weight sharing among all weights in each cluster, then finetune the network, then cluster again, etc. Third step: Use Huffman coding to losslessly compress the model. The SqueezeNet paper reported that Deep Compression achieved a compression ratio of 35 on AlexNet, and a ratio of ~10 on SqueezeNets.

    Read more →
  • NRD Cyber Security

    NRD Cyber Security

    NRD Cyber Security is a Lithuanian company that provides cybersecurity solutions, consulting, and other services. The organization specializes in CSIRT and SOC creation, modernization and training. It has helped to establish national and sectorial CSIRTs around the world, including countries, such as Bangladesh, Egypt, Bhutan, Kosovo, Malawi and others. NRD Cyber Security was found in 2013 to provide quality cybersecurity services to nations and organizations. In 2018 it was included in The Deloitte Technology Fast 50 in Europe list. In 2024 it was awarded the #98 place in MSSP Alert Top 250 world's managed security service providers. The company is a member of various cybersecurity organizations, such as Forum of Incident Response and Security Teams (FIRST), The Global Forum on Cyber Expertise (GFCE), Unicrons Lt. It is a strategic partner of The Global Cyber Security Capacity Centre (GCSCC) at University of Oxford.

    Read more →
  • Line integral convolution

    Line integral convolution

    In scientific visualization, line integral convolution (LIC) is a method to visualize a vector field (such as fluid motion) at high spatial resolutions. The LIC technique was first proposed by Brian Cabral and Leith Casey Leedom in 1993. In LIC, discrete numerical line integration is performed along the field lines (curves) of the vector field on a uniform grid. The integral operation is a convolution of a filter kernel and an input texture, often white noise. In signal processing, this process is known as a discrete convolution. == Overview == Traditional visualizations of vector fields use small arrows or lines to represent vector direction and magnitude. This method has a low spatial resolution, which limits the density of presentable data and risks obscuring characteristic features in the data. More sophisticated methods, such as streamlines and particle tracing techniques, can be more revealing but are highly dependent on proper seed points. Texture-based methods, like LIC, avoid these problems since they depict the entire vector field at point-like (pixel) resolution. Compared to other integration-based techniques that compute field lines of the input vector field, LIC has the advantage that all structural features of the vector field are displayed, without the need to adapt the start and end points of field lines to the specific vector field. In other words, it shows the topology of the vector field. In user testing, LIC was found to be particularly good for identifying critical points. == Algorithm == === Informal description === LIC causes output values to be strongly correlated along the field lines, but uncorrelated in orthogonal directions. As a result, the field lines contrast each other and stand out visually from the background. Intuitively, the process can be understood with the following example: the flow of a vector field can be visualized by overlaying a fixed, random pattern of dark and light paint. As the flow passes by the paint, the fluid picks up some of the paint's color, averaging it with the color it has already acquired. The result is a randomly striped, smeared texture where points along the same streamline tend to have a similar color. Other physical examples include: whorl patterns of paint, oil, or foam on a river visualisation of magnetic field lines using randomly distributed iron filings fine sand being blown by strong wind === Formal mathematical description === Although the input vector field and the result image are discretized, it pays to look at it from a continuous viewpoint. Let v {\displaystyle \mathbf {v} } be the vector field given in some domain Ω {\displaystyle \Omega } . Although the input vector field is typically discretized, we regard the field v {\displaystyle \mathbf {v} } as defined in every point of Ω {\displaystyle \Omega } , i.e. we assume an interpolation. Streamlines, or more generally field lines, are tangent to the vector field in each point. They end either at the boundary of Ω {\displaystyle \Omega } or at critical points where v = 0 {\displaystyle \mathbf {v} =\mathbf {0} } . For the sake of simplicity, critical points and boundaries are ignored in the following. A field line σ {\displaystyle {\boldsymbol {\sigma }}} , parametrized by arc length s {\displaystyle s} , is defined as d σ ( s ) d s = v ( σ ( s ) ) | v ( σ ( s ) ) | . {\displaystyle {\frac {d{\boldsymbol {\sigma }}(s)}{ds}}={\frac {\mathbf {v} ({\boldsymbol {\sigma }}(s))}{|\mathbf {v} ({\boldsymbol {\sigma }}(s))|}}.} Let σ r ( s ) {\displaystyle {\boldsymbol {\sigma }}_{\mathbf {r} }(s)} be the field line that passes through the point r {\displaystyle \mathbf {r} } for s = 0 {\displaystyle s=0} . Then the image gray value at r {\displaystyle \mathbf {r} } is set to D ( r ) = ∫ − L / 2 L / 2 k ( s ) N ( σ r ( s ) ) d s {\displaystyle D(\mathbf {r} )=\int _{-L/2}^{L/2}k(s)N({\boldsymbol {\sigma }}_{\mathbf {r} }(s))ds} where k ( s ) {\displaystyle k(s)} is the convolution kernel, N ( r ) {\displaystyle N(\mathbf {r} )} is the noise image, and L {\displaystyle L} is the length of field line segment that is followed. D ( r ) {\displaystyle D(\mathbf {r} )} has to be computed for each pixel in the LIC image. If carried out naively, this is quite expensive. First, the field lines have to be computed using a numerical method for solving ordinary differential equations, like a Runge–Kutta method, and then for each pixel the convolution along a field line segment has to be calculated. The final image will normally be colored in some way. Typically, some scalar field in Ω {\displaystyle \Omega } (like the vector length) is used to determine the hue, while the grayscale LIC output determines the brightness. Different choices of convolution kernels and random noise produce different textures; for example, pink noise produces a cloudy pattern where areas of higher flow stand out as smearing, suitable for weather visualization. Further refinements in the convolution can improve the quality of the image. === Programming description === Algorithmically, LIC takes a vector field and noise texture as input, and outputs a texture. The process starts by generating in the domain of the vector field a random gray level image at the desired output resolution. Then, for every pixel in this image, the forward and backward streamline of a fixed arc length is calculated. The value assigned to the current pixel is computed by a convolution of a suitable convolution kernel with the gray levels of all the noise pixels lying on a segment of this streamline. This creates a gray level LIC image. == Versions == === Basic === Basic LIC images are grayscale images, without color and animation. While such LIC images convey the direction of the field vectors, they do not indicate orientation; for stationary fields, this can be remedied by animation. Basic LIC images do not show the length of the vectors (or the strength of the field). === Color === The length of the vectors (or the strength of the field) is usually coded in color; alternatively, animation can be used. === Animation === LIC images can be animated by using a kernel that changes over time. Samples at a constant time from the streamline would still be used, but instead of averaging all pixels in a streamline with a static kernel, a ripple-like kernel constructed from a periodic function multiplied by a Hann function acting as a window (in order to prevent artifacts) is used. The periodic function is then shifted along the period to create an animation. === Fast LIC (FLIC) === The computation can be significantly accelerated by re-using parts of already computed field lines, specializing to a box function as convolution kernel k ( s ) {\displaystyle k(s)} and avoiding redundant computations during convolution. The resulting fast LIC method can be generalized to convolution kernels that are arbitrary polynomials. === Oriented Line Integral Convolution (OLIC) === Because LIC does not encode flow orientation, it cannot distinguish between streamlines of equal direction but opposite orientation. Oriented Line Integral Convolution (OLIC) solves this issue by using a ramp-like asymmetric kernel and a low-density noise texture. The kernel asymmetrically modulates the intensity along the streamline, producing a trace that encodes orientation; the low-density of the noise texture prevents smeared traces from overlapping, aiding readability. Fast Rendering of Oriented Line Integral Convolution (FROLIC) is a variation that approximates OLIC by rendering each trace in discrete steps instead of as a continuous smear. === Unsteady Flow LIC (UFLIC) === For time-dependent vector fields (unsteady flow), a variant called Unsteady Flow LIC has been designed that maintains the coherence of the flow animation. An interactive GPU-based implementation of UFLIC has been presented. === Parallel === Since the computation of an LIC image is expensive but inherently parallel, the process has been parallelized and, with availability of GPU-based implementations, interactive on PCs. === Multidimensional === Note that the domain Ω {\displaystyle \Omega } does not have to be a 2D domain: the method is applicable to higher dimensional domains using multidimensional noise fields. However, the visualization of the higher-dimensional LIC texture is problematic; one way is to use interactive exploration with 2D slices that are manually positioned and rotated. The domain Ω {\displaystyle \Omega } does not have to be flat either; the LIC texture can be computed also for arbitrarily shaped 2D surfaces in 3D space. == Applications == This technique has been applied to a wide range of problems since it first was published in 1993, both scientific and creative, including: Representing vector fields: visualization of steady (time-independent) flows (streamlines) visual exploration of 2D autonomous dynamical systems wind mapping water flow mapping Artistic effects for image generation and stylization: pencil drawing (auto

    Read more →
  • Microsoft Support Diagnostic Tool

    Microsoft Support Diagnostic Tool

    The Microsoft Support Diagnostic Tool (MSDT) is a legacy service in Microsoft Windows that allows Microsoft technical support agents to analyze diagnostic data remotely for troubleshooting purposes. In April 2022 it was observed to have a security vulnerability that allowed remote code execution which was being exploited to attack computers in Russia and Belarus, and later against the Tibetan government in exile. Microsoft advised a temporary workaround of disabling the MSDT by editing the Windows registry. == Use == When contacting support the user is told to run MSDT and given a unique "passkey" which they enter. They are also given an "incident number" to uniquely identify their case. The MSDT can also be run offline which will generate a .CAB file which can be uploaded from a computer with an internet connection. == Security vulnerabilities == === Follina === Follina is the name given to a remote code execution (RCE) vulnerability, a type of arbitrary code execution (ACE) exploit, in the Microsoft Support Diagnostic Tool (MSDT) which was first widely publicized on May 27, 2022, by a security research group called Nao Sec. This exploit allows a remote attacker to use a Microsoft Office document template to execute code via MSDT. This works by exploiting the ability of Microsoft Office document templates to download additional content from a remote server. If the size of the downloaded content is large enough it causes a buffer overflow allowing a payload of Powershell code to be executed without explicit notification to the user. On May 30 Microsoft issued CVE-2022-30190 with guidance that users should disable MSDT. Malicious actors have been observed exploiting the bug to attack computers in Russia and Belarus since April, and it is believed Chinese state actors had been exploiting it to attack the Tibetan government in exile based in India. Microsoft patched this vulnerability in its June 2022 patches. === DogWalk === The DogWalk vulnerability is a remote code execution (RCE) vulnerability in the Microsoft Support Diagnostic Tool (MSDT). It was first reported in January 2020, but Microsoft initially did not consider it to be a security issue. However, the vulnerability was later exploited in the wild, and Microsoft released a patch for it in August 2022. The vulnerability is caused by a path traversal vulnerability in the sdiageng.dll library. This vulnerability allows an attacker to trick a victim into opening a malicious diagcab file, which is a type of Windows cabinet file that is used to store support files. When the diagcab file is opened, it triggers the MSDT tool, which then executes the malicious code. Originally discovered by Mitja Kolsek, the DogWalk vulnerability is caused by a path traversal vulnerability in the sdiageng.dll library. This vulnerability allows an attacker to trick a victim into opening a malicious diagcab file, which is a type of Windows cabinet file that is used to store support files. When the diagcab file is opened, it triggers the MSDT tool, which then executes the malicious code. The vulnerability is exploited by creating a malicious diagcab file that contains a specially crafted path. This path contains a sequence of characters that is designed to exploit the path traversal vulnerability in the sdiageng.dll library. When the diagcab file is opened, the MSDT tool will attempt to follow the path. However, the path will contain characters that are not valid for a Windows path. This will cause the MSDT tool to crash. When the MSDT tool crashes, it will generate a memory dump. This memory dump will contain the malicious code that was executed by the MSDT tool. The attacker can then use this memory dump to extract the malicious code and execute it on their own computer. == Retirement == Microsoft will no longer be supporting the Windows legacy inbox Troubleshooters. In 2025, Microsoft will remove the MSDT platform entirely. Get Help is the replacement tool. == Windows versions == Windows 7 Windows 8.1 Windows 10 Windows 11 (up to 22H2) Future versions and feature upgrades will deprecate the MSDT after May 23, 2023.

    Read more →
  • Pandemonium architecture

    Pandemonium architecture

    Pandemonium architecture is a theory in cognitive science that describes how visual images are processed by the brain. It has applications in artificial intelligence and pattern recognition. The theory was introduced by the artificial intelligence pioneer Oliver Selfridge in his 1959 paper "Pandemonium - A Paradigm for Learning". It describes the process of object recognition as the exchange of signals within a hierarchical system of detection and association, the elements of which Selfridge metaphorically termed "demons". This model is now recognized as the basis of visual perception in cognitive science. Pandemonium architecture arose in response to the inability of template matching theories to offer a biologically plausible explanation of the image constancy phenomenon. Contemporary researchers praise this architecture for its elegancy and creativity; that the idea of having multiple independent systems (e.g., feature detectors) working in parallel to address the image constancy phenomena of pattern recognition is powerful yet simple. The basic idea of the pandemonium architecture is that a pattern is first perceived in its parts before the "whole". Pandemonium architecture was one of the first computational models in pattern recognition. Although not perfect, the pandemonium architecture influenced the development of modern connectionist, artificial intelligence, and word recognition models. == History == Most research in perception has been focused on the visual system, investigating the mechanisms of how we see and understand objects. A critical function of our visual system is its ability to recognize patterns, but the mechanism by which this is achieved is unclear. The earliest theory that attempted to explain how we recognize patterns is the template matching model. According to this model, we compare all external stimuli against an internal mental representation. If there is "sufficient" overlap between the perceived stimulus and the internal representation, we will "recognize" the stimulus. Although some machines follow a template matching model (e.g., bank machines verifying signatures and accounting numbers), the theory is critically flawed in explaining the phenomena of image constancy: we can easily recognize a stimulus regardless of the changes in its form of presentation (e.g., T and T are both easily recognized as the letter T). It is highly unlikely that we have a stored template for all of the variations of every single pattern. As a result of the biological plausibility criticism of the template matching model, feature detection models began to rise. In a feature detection model, the image is first perceived in its basic individual elements before it is recognized as a whole object. For example, when we are presented with the letter A, we would first see a short horizontal line and two slanted long diagonal lines. Then we would combine the features to complete the perception of A. Each unique pattern consists of different combination of features, which means those that are formed with the same features will generate the same recognition. That is, regardless of how we rotate the letter A, is still perceived as the letter A. It is easy for this sort of architecture to account for the image constancy phenomena because you only need to "match" at the basic featural level, which is presumed to be limited and finite, thus biologically plausible. The best known feature detection model is called the pandemonium architecture. == Pandemonium architecture == The pandemonium architecture was originally developed by Oliver Selfridge in the late 1950s. The architecture is composed of different groups of "demons" working independently to process the visual stimulus. Each group of demons is assigned to a specific stage in recognition, and within each group, the demons work in parallel. There are four major groups of demons in the original architecture. The concept of feature demons, that there are specific neurons dedicated to perform specialized processing is supported by research in neuroscience. Hubel and Wiesel found there were specific cells in a cat's brain that responded to specific lengths and orientations of a line. Similar findings were discovered in frogs, octopuses and a variety of other animals. Octopuses were discovered to be only sensitive to verticality of lines, whereas frogs demonstrated a wider range of sensitivity. These animal experiments demonstrate that feature detectors seem to be a very primitive development. That is, it did not result from the higher cognitive development of humans. Not surprisingly, there is also evidence that the human brain possesses these elementary feature detectors as well. Moreover, this architecture is capable of learning, similar to a back-propagation styled neural network. The weight between the cognitive and feature demons can be adjusted in proportion to the difference between the correct pattern and the activation from the cognitive demons. To continue with our previous example, when we first learned the letter R, we know is composed of a curved, long straight, and a short angled line. Thus when we perceive those features, we perceive R. However, the letter P consists of very similar features, so during the beginning stages of learning, it is likely for this architecture to mistakenly identify R as P. But through constant exposure of confirming R's features to be identified as R, the weights of R's features to P are adjusted so the P response becomes inhibited (e.g., learning to inhibit the P response when a short angled line is detected). In principle, a pandemonium architecture can recognize any pattern. As mentioned earlier, this architecture makes error predictions based on the amount of overlapping features. Such as, the most likely error for R should be P. Thus, in order to show this architecture represents the human pattern recognition system we must put these predictions into test. Researchers have constructed scenarios where various letters are presented in situations that make them difficult to identify; then types of errors were observed, which was used to generate confusion matrices: where all of the errors for each letter are recorded. Generally, the results from these experiments matched the error predictions from the pandemonium architecture. Also as a result of these experiments, some researchers have proposed models that attempted to list all of the basic features in the Roman alphabet. == Criticism == A major criticism of the pandemonium architecture is that it adopts a completely bottom-up processing: recognition is entirely driven by the physical characteristics of the targeted stimulus. This means that it is unable to account for any top-down processing effects, such as context effects (e.g., pareidolia), where contextual cues can facilitate (e.g., word superiority effect: it is relatively easier to identify a letter when it is part of a word than in isolation) processing. However, this is not a fatal criticism to the overall architecture, because is relatively easy to add a group of contextual demons to work along with the cognitive demons to account for these context effects. Although the pandemonium architecture is built on the fact that it can account for the image constancy phenomena, some researchers have argued otherwise; and pointed out that the pandemonium architecture might share the same flaws from the template matching models. For example, the letter H is composed of 2 long vertical lines and a short horizontal line; but if we rotate the H 90 degrees in either direction, it is now composed of 2 long horizontal lines and a short vertical line. In order to recognize the rotated H as H, we would need a rotated H cognitive demon. Thus we might end up with a system that requires a large number of cognitive demons in order to produce accurate recognition, which would lead to the same biological plausibility criticism of the template matching models. However, it is rather difficult to judge the validity of this criticism because the pandemonium architecture does not specify how and what features are extracted from incoming sensory information, it simply outlines the possible stages of pattern recognition. But of course that raises its own questions, to which it is almost impossible to criticize such a model if it does not include specific parameters. Also, the theory appears to be rather incomplete without defining how and what features are extracted, which proves to be especially problematic with complex patterns (e.g., extracting the weight and features of a dog). Some researchers have also pointed out that the evidence supporting the pandemonium architecture has been very narrow in its methodology. Majority of the research that supports this architecture has often referred to its ability to recognize simple schematic drawings that are selected from a small finite set (e.g., letters in the Roman alphabet). Evidence from these types of exper

    Read more →
  • Starlight Information Visualization System

    Starlight Information Visualization System

    Starlight is a software product originally developed at Pacific Northwest National Laboratory and now by Future Point Systems. It is an advanced visual analysis environment. In addition to using information visualization to show the importance of individual pieces of data by showing how they relate to one another, it also contains a small suite of tools useful for collaboration and data sharing, as well as data conversion, processing, augmentation and loading. The software, originally developed for the intelligence community, allows users to load data from XML files, databases, RSS feeds, web services, HTML files, Microsoft Word, PowerPoint, Excel, CSV, Adobe PDF, TXT files, etc. and analyze it with a variety of visualizations and tools. The system integrates structured, unstructured, geospatial, and multimedia data, offering comparisons of information at multiple levels of abstraction, simultaneously and in near real-time. In addition Starlight allows users to build their own named entity-extractors using a combination of algorithms, targeted normalization lists and regular expressions in the Starlight Data Engineer (SDE). As an example, Starlight might be used to look for correlations in a database containing records about chemical spills. An analyst could begin by grouping records according to the cause of the spill to reveal general trends. Sorting the data a second time, they could apply different colors based on related details such as the company responsible, age of equipment or geographic location. Maps and photographs could be integrated into the display, making it even easier to recognize connections among multiple variables. Starlight has been deployed to both the Iraq and Afghanistan wars and used on a number of large-scale projects. PNNL began developing Starlight in the mid-1990s, with funding from the Land Information Warfare Agency, a part of the Army Intelligence and Security Command and continued developed at the laboratory with funding from the NSA and the CIA. Starlight integrates visual representations of reports, radio transcripts, radar signals, maps and other information. The software system was recently honored with an R&D 100 Award for technical innovation. In 2006 Future Point Systems, a Silicon Valley startup, acquired rights to jointly develop and distribute the Starlight product in cooperation with the Pacific Northwest National Laboratory. The software is now also used outside of the military/intelligence communities in a number of commercial environments.

    Read more →
  • Data commingling

    Data commingling

    Data commingling, in computer science, occurs when different items or kinds of data are stored in such a way that they become commonly accessible when they are supposed to remain separated. In cloud computing, this can occur where different customer data sits on the same server. Data that is commingled can present a security vulnerability. Data commingling can also occur due to high speed data transmission mixing. In this situation, data of one security level can inadvertently or purposely be mixed with data of a lower or higher security level on the same transmission portal. Portal vehicles can be wire, fiber optics, microwave or various radio frequency transmission portals. This commingling can cause breaches of security and become a source of legal issues to any entity, corporation or individual. Data commingling can also occur when personal computers and personal software programs are used for business, security, government, etc. uses. In the early formulation stages of entities, non-profit or profit corporations, LLC's, LLP's, etc., the creation and use of stand-alone computers and stand-alone networks, "absolutely unconnected" to involved individuals, is the easiest, and safest way to prevent Data Commingling.

    Read more →