AI Content Detection Tools

AI Content Detection Tools — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Phase correlation

    Phase correlation

    Phase correlation is an approach to estimate the relative translative offset between two similar images (digital image correlation) or other data sets. It is commonly used in image registration and relies on a frequency-domain representation of the data, usually calculated by fast Fourier transforms. The term is applied particularly to a subset of cross-correlation techniques that isolate the phase information from the Fourier-space representation of the cross-correlogram. == Example == The following image demonstrates the usage of phase correlation to determine relative translative movement between two images corrupted by independent Gaussian noise. The image was translated by (20,23) pixels. Accordingly, one can clearly see a peak in the phase-correlation representation at approximately (20,23). == Method == Given two input images g a {\displaystyle \ g_{a}} and g b {\displaystyle \ g_{b}} : Apply a window function (e.g., a Hamming window) on both images to reduce edge effects (this may be optional depending on the image characteristics). Then, calculate the discrete 2D Fourier transform of both images. G a = F { g a } , G b = F { g b } {\displaystyle \ \mathbf {G} _{a}={\mathcal {F}}\{g_{a}\},\;\mathbf {G} _{b}={\mathcal {F}}\{g_{b}\}} Calculate the cross-power spectrum by taking the complex conjugate of the second result, multiplying the Fourier transforms together elementwise, and normalizing this product elementwise. R = G a ∘ G b ∗ | G a ∘ G b ∗ | {\displaystyle \ R={\frac {\mathbf {G} _{a}\circ \mathbf {G} _{b}^{}}{|\mathbf {G} _{a}\circ \mathbf {G} _{b}^{}|}}} Where ∘ {\displaystyle \circ } is the Hadamard product (entry-wise product) and the absolute values are taken entry-wise as well. Written out entry-wise for element index ( j , k ) {\displaystyle (j,k)} : R j k = G a , j k ⋅ G b , j k ∗ | G a , j k ⋅ G b , j k ∗ | {\displaystyle \ R_{jk}={\frac {G_{a,jk}\cdot G_{b,jk}^{}}{|G_{a,jk}\cdot G_{b,jk}^{}|}}} Obtain the normalized cross-correlation by applying the inverse Fourier transform. r = F − 1 { R } {\displaystyle \ r={\mathcal {F}}^{-1}\{R\}} Determine the location of the peak in r {\displaystyle \ r} . ( Δ x , Δ y ) = arg ⁡ max ( x , y ) { r } {\displaystyle \ (\Delta x,\Delta y)=\arg \max _{(x,y)}\{r\}} === Subpixel registration === Commonly, interpolation methods are used to estimate the peak location in the cross-correlogram to non-integer values, despite the fact that the data are discrete, and this procedure is often termed 'subpixel registration'. A large variety of subpixel interpolation methods are given in the technical literature. Common peak interpolation methods such as parabolic interpolation have been used, and the OpenCV computer vision package uses a centroid-based method, though these generally have inferior accuracy compared to more sophisticated methods. Because the Fourier representation of the data has already been computed, it is especially convenient to use the Fourier shift theorem with real-valued (sub-integer) shifts for this purpose, which essentially interpolates using the sinusoidal basis functions of the Fourier transform. An especially popular FT-based estimator is given by Foroosh et al. In this method, the subpixel peak location is approximated by a simple formula involving peak pixel value and the values of its nearest neighbors, where r ( 0 , 0 ) {\displaystyle r_{(0,0)}} is the peak value and r ( 1 , 0 ) {\displaystyle r_{(1,0)}} is the nearest neighbor in the x direction (assuming, as in most approaches, that the integer shift has already been found and the comparand images differ only by a subpixel shift). Δ x = r ( 1 , 0 ) r ( 1 , 0 ) ± r ( 0 , 0 ) {\displaystyle \ \Delta x={\frac {r_{(1,0)}}{r_{(1,0)}\pm r_{(0,0)}}}} The Foroosh et al. method is quite fast compared to most methods, though it is not always the most accurate. Some methods shift the peak in Fourier space and apply non-linear optimization to maximize the correlogram peak, but these tend to be very slow since they must apply an inverse Fourier transform or its equivalent in the objective function. It is also possible to infer the peak location from phase characteristics in Fourier space without the inverse transformation, as noted by Stone. These methods usually use a linear least squares (LLS) fit of the phase angles to a planar model. The long latency of the phase angle computation in these methods is a disadvantage, but the speed can sometimes be comparable to the Foroosh et al. method depending on the image size. They often compare favorably in speed to the multiple iterations of extremely slow objective functions in iterative non-linear methods. Since all subpixel shift computation methods are fundamentally interpolative, the performance of a particular method depends on how well the underlying data conform to the assumptions in the interpolator. This fact also may limit the usefulness of high numerical accuracy in an algorithm, since the uncertainty due to interpolation method choice may be larger than any numerical or approximation error in the particular method. Subpixel methods are also particularly sensitive to noise in the images, and the utility of a particular algorithm is distinguished not only by its speed and accuracy but its resilience to the particular types of noise in the application. == Rationale == The method is based on the Fourier shift theorem. Let the two images g a {\displaystyle \ g_{a}} and g b {\displaystyle \ g_{b}} be circularly-shifted versions of each other: g b ( x , y ) = d e f g a ( ( x − Δ x ) mod M , ( y − Δ y ) mod N ) {\displaystyle \ g_{b}(x,y)\ {\stackrel {\mathrm {def} }{=}}\ g_{a}((x-\Delta x){\bmod {M}},(y-\Delta y){\bmod {N}})} (where the images are M × N {\displaystyle \ M\times N} in size). Then, the discrete Fourier transforms of the images will be shifted relatively in phase: G b ( u , v ) = G a ( u , v ) e − 2 π i ( u Δ x M + v Δ y N ) {\displaystyle \mathbf {G} _{b}(u,v)=\mathbf {G} _{a}(u,v)e^{-2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}} One can then calculate the normalized cross-power spectrum to factor out the phase difference: R ( u , v ) = G a G b ∗ | G a G b ∗ | = G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | = G a G a ∗ e 2 π i ( u Δ x M + v Δ y N ) | G a G a ∗ | = e 2 π i ( u Δ x M + v Δ y N ) {\displaystyle {\begin{aligned}R(u,v)&={\frac {\mathbf {G} _{a}\mathbf {G} _{b}^{}}{|\mathbf {G} _{a}\mathbf {G} _{b}^{}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}|}}\\&={\frac {\mathbf {G} _{a}\mathbf {G} _{a}^{}e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}}{|\mathbf {G} _{a}\mathbf {G} _{a}^{}|}}\\&=e^{2\pi i({\frac {u\Delta x}{M}}+{\frac {v\Delta y}{N}})}\end{aligned}}} since the magnitude of an imaginary exponential always is one, and the phase of G a G a ∗ {\displaystyle \ \mathbf {G} _{a}\mathbf {G} _{a}^{}} always is zero. The inverse Fourier transform of a complex exponential is a Dirac delta function, i.e. a single peak: r ( x , y ) = δ ( x + Δ x , y + Δ y ) {\displaystyle \ r(x,y)=\delta (x+\Delta x,y+\Delta y)} This result could have been obtained by calculating the cross correlation directly. The advantage of this method is that the discrete Fourier transform and its inverse can be performed using the fast Fourier transform, which is much faster than correlation for large images. === Benefits === Unlike many spatial-domain algorithms, the phase correlation method is resilient to noise, occlusions, and other defects typical of medical or satellite images. The method can be extended to determine rotation and scaling differences between two images by first converting the images to log-polar coordinates. Due to properties of the Fourier transform, the rotation and scaling parameters can be determined in a manner invariant to translation. === Limitations === In practice, it is more likely that g b {\displaystyle \ g_{b}} will be a simple linear shift of g a {\displaystyle \ g_{a}} , rather than a circular shift as required by the explanation above. In such cases, r {\displaystyle \ r} will not be a simple delta function, which will reduce the performance of the method. In such cases, a window function (such as a Gaussian or Tukey window) should be employed during the Fourier transform to reduce edge effects, or the images should be zero padded so that the edge effects can be ignored. If the images consist of a flat background, with all detail situated away from the edges, then a linear shift will be equivalent to a circular shift, and the above derivation will hold exactly. The peak can be sharpened by using edge or vector correlation. For periodic images (such as a chessboard or picket fence), phase correlation may yield ambiguous results with several peaks in the resulting output. == Applications == Phase correlation is the preferred m

    Read more →
  • Forward anonymity

    Forward anonymity

    Forward anonymity is a property of a cryptographic system which prevents an attacker who has recorded past encrypted communications from discovering its contents and participants in the future. This property is analogous to forward secrecy. An example of a system which uses forward anonymity is a public key cryptography system, where the public key is well-known and used to encrypt a message, and an unknown private key is used to decrypt it. In this system, one of the keys is always said to be compromised, but messages and their participants are still unknown by anyone without the corresponding private key. In contrast, an example of a system which satisfies the perfect forward secrecy property is one in which a compromise of one key by an attacker (and consequent decryption of messages encrypted with that key) does not undermine the security of previously used keys. Forward secrecy does not refer to protecting the content of the message, but rather to the protection of keys used to decrypt messages. == History == Originally introduced by Whitfield Diffie, Paul van Oorschot, and Michael James Wiener to describe a property of STS (station-to-station protocol) involving a long term secret, either a private key or a shared password. == Public Key Cryptography == Public Key Cryptography is a common form of a forward anonymous system. It is used to pass encrypted messages, preventing any information about the message from being discovered if the message is intercepted by an attacker. It uses two keys, a public key and a private key. The public key is published, and is used by anyone to encrypt a plaintext message. The Private key is not well known, and is used to decrypt cyphertext. Public key cryptography is known as an asymmetric decryption algorithm because of different keys being used to perform opposing functions. Public key cryptography is popular because, while it is computationally easy to create a pair of keys, it is extremely difficult to determine the private key knowing only the public key. Therefore, the public key being well known does not allow messages which are intercepted to be decrypted. This is a forward anonymous system because one compromised key (the public key) does not compromise the anonymity of the system. == Web of Trust == A variation of the public key cryptography system is a Web of trust, where each user has both a public and private key. Messages sent are encrypted using the intended recipient's public key, and only this recipient's private key will decrypt the message. They are also signed with the senders private key. This creates added security where it becomes more difficult for an attacker to pretend to be a user, as the lack of a private key signature indicates a non-trusted user. == Limitations == A forward anonymous system does not necessarily mean a wholly secure system. A successful cryptanalysis of a message or sequence of messages can still decode the information without the use of a private key or long term secret. == News == Forward anonymity, along with other privacy-protecting measures, received a burst of media attention after the leak of classified information by Edward Snowden, beginning in June, 2013, which indicated that the NSA and FBI, through specially crafted backdoors in software and computer systems, were conducting mass surveillance over large parts of the population of both the United States (see Mass surveillance in the United States), Europe, Asia, and other parts of the world. They justified this practice as an aid to catch predatory pedophiles. Opponents to this practice argue that leaving in a back door to law enforcement increases the risk of attackers being able to decrypt information, as well as questioning its legality under the US Constitution, specifically being a form of illegal Search and Seizure.

    Read more →
  • IEBus

    IEBus

    IEBus (Inter Equipment Bus) is a communication bus specification "between equipments within a vehicle or a chassis" of Renesas Electronics. It defines OSI model layer 1 and layer 2 specification. IEBus is mainly used for car audio and car navigations, which established de facto standard in Japan, though SAE J1850 is major in United States. IEBus is also used in some vending machines, which major customer is Fuji Electric. Each button on the vending machine has an IEBus ID, i.e. has a controller. Detailed specification is disclosed to licensees only, but protocol analyzers are provided from some test equipment vendors. Its modulation method is PWM (Pulse-Width Modulation) with 6.00 MHz base clock originally, but most of automotive customers use 6.291 MHz, and physical layer is a pair of differential signalling harness. Its physical layer adopts half-duplex, asynchronous, and multi-master communication with carrier-sense multiple access with collision detection (CSMA/CD) for medium access control. It allows for up to fifty units on one bus over a maximum length of 150 meters. Two differential signalling lines are used with Bus+ / Bus− naming, sometimes labeled as Data(+) / Data(−). It is sometimes described as "IE-BUS", "IE-Bus," or "IE Bus," but these are incorrect. In formal, it is "IEBus." IEBus® and Inter Equipment Bus® are registered trademark symbols of Renesas Electronics Corporation, formerly NEC Electronics Corporation, (JPO: Reg. No.2552418 and 2552419, respectively). == History == In the middle of '80s, semiconductor unit of NEC Corporation, currently Renesas Electronics, started the study for increasing demands for automotive audio systems. IEBus is introduced as a solution for the distributed control system. In the late 1980s, several similar specifications, including the Domestic Digital Bus (D2B), the Japanese Home Bus (HBS), and the European Home System (EHS) are proposed by different companies or organizations. These were once discussed as IEC 61030, but it was withdrawn in 2006. IEBus is also a similar specification (refer to "Transfer signal format" section), but not listed in these criteria. As the result, IEBus becomes a de facto standard of car audio in Japan. Regarding the Domestic Digital Bus (D2B), it is re-defined as D2B Optical by Mercedes-Benz independently. As for Japanese Home Bus System (HBS), it is defined in 1988 as Home Bus System Standard Specification, ET-2101 by JEITA and REEA (Radio Engineering & Electronics Assiation) in Japan. It is being used by several Japanese air conditioner manufacturers (for example, M-Net from Mitsubishi and the P1/P2 or F1/F2 bus from Daikin). Fujitsu provided HBPC (Home Bus Protocol Controller) chip as MB86046B. But it is unclear whether Fujitsu (currently, Cypress) still manufactures this HBPC LSI as of 2018. Mitsumi Electric provides the MM1007 and MM1192 driver ICs for HBS. The HBS specification is also discussed in the Echonet Consortium. In 2014, a utility model patent for protocol converter from HBS to RS-485 is granted in China as "CN204006496U." Regarding the replacement of IEBus, a paper by Hyundai Autonet, currently Hyundai Mobis, describes as follows. "In communication methods for digital input capable amplifiers, Inter Equipment Bus (IEBus) was used in early times, but for now, Controller Area Network (CAN) is mainly used." == Protocol overview == A master talks to a slave. Each unit has a master and a slave address register. Only one device can talk on the bus at any given time. There is a pecking order for the types of communications which will take precedence over another. Each communication from master to slave must be replied to by the slave going back to the master with acknowledge bits each of those show ACK or NAK. If the master does not receive the ACK within a predefined time allowance for a mode, it drops the communication and returns to its standby (listen) mode. Detailed specification of OSI model layer 2 is disclosed to licensees only, but protocol analyzers are provided from some test equipment vendors. In 2012, one of Chinese manufacturer's patent is granted as "CN202841169U". An open-source software emulator called "IEBus Studio" exists on a repository of SourceForge, but the last update was on 2008-02-24. Another open-source analyzer software called "IEBusAnalyzer" is available on GitHub repository. Some hobbyist made some tools also. === Physical layer (OSI model layer 1) specification overview === From μPD6708 data sheet. and μPD78098B Subseries user's manual, hardware. Communication system Half-duplex asynchronous communication Multi-master system All the units connected to the IEBus can transfer data to the other units. Broadcast communication function (communication between one unit and multiple units) Normally, communication is individually carried out from one unit to another. By using the broadcast communication function, however, communication can be executed from one unit to plural units as follows: Group broadcast communication: Broadcast communication to group units Simultaneous broadcast communication: Broadcast communication to all units Effective transmission rate The effective transmission rate can be selected from the following three communication modes: Mixture of the plural of modes in the same bus line is not allowed. Correct communication between different base clock is not possible. Access control CSMA/CD (Carrier Sense Multiple Access with Collision Detection) The priority of occupying IEBus is as follows: «1» Broadcast communication takes precedence over individual communication. «2» The lower the master address, the higher the priority. Communication scale Number of units: 50 MAX. Cable length: 150 m MAX. (when a twisted pair cable is used) Load capacity: MAX. 8000 pF; between Bus+ and Bus−, (6.000000 MHz base clock) MAX. 7100 pF; between Bus+ and Bus−, (6.291456 MHz base clock) Terminating resistor: 120 Ω Logic level Logic 1: Low level. Voltage difference between Bus+ and Bus− is under 20mV Logic 0: High Level. Voltage difference between Bus+ and Bus− is over 120mV In-phase input voltage high: Bus+ ≤ (VDD-1.0) V, Bus− ≥ 1.0 V === Transfer signal format === From μPD6708 data sheet. and μPD78098B Subseries user's manual, hardware. This frame format is much similar to that of Domestic Digital Bus (D2B). All fields are MSB first. ==== Functions of Control bits ==== === Bit format === Each IEBus bit consists of four periods. Preparation period: The first or subsequent low-level (logic "1") period Synchronization period: Next high-level (logic "0") period Data period: Period indicating value of bit; ether low-level (logic "1") or high-level (logic "0") Stop period: The last low-level (logic "1") period Synchronization is done by each bit. Time lengths of the synchronization period and data period are almost the same. The time of the entire bits' and each bit's specification, related to the time of each period allocated to it, differ depending both on the type of the transmit bit and on whether the unit is the master or a slave unit. == Automotive manufacturers using IEBus == Each manufacturer has its own name, but it is not an alias of IEBus. Those are specifications of wire harness which comprise control cables based on IEBus, OSI model layer 3 and above communication protocol, audio cables, interconnection couplers, and so on. === Pioneer === Pioneer Corporation employed IEBus for its original branded car audio in early '90s. In its earlier stage, it was used just for control bus between the head unit in dashboard and the CD changer usually placed in trunk room. Nowadays, the specification includes connection between head units, navigation systems, rear speaker systems, and so on. IP-Bus: Wire harness specification. === Toyota === Pioneer Corporation pushed Toyota Motor Corporation to adopt IEBus as the genuine parts. In 1994, Toyota decided to employ IEBus for its genuine specification, but it is slightly different from that of Pioneer. It is named as AVC-LAN. AVC-LAN: Wire harness specification, based on mode 2. === Honda/Acura === Pioneer Corporation also pushed Honda Motor. Honda also decided to adopt IEBus as its genuine parts specification just after Toyota do so. GA-NET II: Wire harness specification. Honda Music Link: Honda genuine gadget to connect Apple Inc. products. A hobbyist made touch screen controller on Acura TSX for a Car PC installed in the trunk. === Sirius XM Satellite Radio === Sirius XM Satellite Radio is a satellite broadcasting radio operator in US. Its digital media receiver equipment utilizes IEBus. == Evaluation boards == === SAKURA board === GR-SAKUKRA board and GR-SAKURA-FULL board are Renesas official promotion boards of RX63N chip, which enables IEBus mode 0 and 1, but not mode 2, i.e. not available for Toyota AVC-LAN. They are an Arduino pin compatible low-price ones, suitable for hobbyists. Their color of printed circuit board is SAKURA in Japanese, which means cherry blossom. To e

    Read more →
  • Protecting Kids From Social Media Act

    Protecting Kids From Social Media Act

    Protecting Kids on Social Media Act or HB 1891 is an American law that was introduced by William Lamberth of Sumner County, Tennessee and was signed into law by Tennessee's governor on May 2, 2024. The bill requires social media websites such as X, YouTube, TikTok, Facebook and others to verify the age of users and if those users are under 18, they must have parental consent. == Progress == The law passed the Tennessee State Legislature with little opposition: the bill had only two no votes in the House from Aftyn Behn and Vincent B. Dixie, and it had zero no votes in the Senate. == Bill summary == Every social media company must verify the age of new users after the law takes effect, and if the user had created an account before the law took effect, they must verify the age of the person attempting to access the account within 14 days. If the new user or the user who originally owned an account is under 18 years of age, they must get parental consent and the third party or social media company must not retain the data from the age verification process or obtaining parental consent. Parents who are account holders of those under 18 can view the privacy settings, set daily time restrictions, and implement breaks during which the minor cannot access the account. The law is enforced by the Attorney General of Tennessee and went into effect on January 1, 2025. == Lawsuit == On October 3, 2024, the trade association NetChoice filed a lawsuit against Tennessee Attorney General Jonathan Skrmetti in the Middle District Court of Tennessee, claiming that the law violates the First Amendment. The Judge for the case is William L. Campbell Jr. An initial case management conference was originally scheduled for December 4, 2024, however it was delayed because of the Supreme Court case United States v. Skrmetti, recommending that the conference be delayed after January 20, 2025. On February 14, 2025, Judge Eli Richardson denied NetChoice's motion for a temporary restraining order because it would disrupt the status quo of the case.

    Read more →
  • International Road Traffic and Accident Database

    International Road Traffic and Accident Database

    The International Road Traffic and Accident Database (IRTAD) is an initiative dedicated to compiling and analyzing global road crash data. It is managed by the International Transport Forum (ITF) under the auspices of its permanent working group, which specializes in road safety, commonly referred to as the IRTAD Group. The primary objective of IRTAD is to provide a robust empirical basis for international comparisons in the field of road safety and to offer data to support the formulation of effective road safety policies. == Data availability == A portion of the data gathered by IRTAD is accessible for free through the OECD statistics website, however the remaining data requires a subscription for access. == History == The IRTAD database was originally started in 1988 by Germany's Federal Institution for Roads (BASt) in response to demands for international comparative data. It was later taken over and expanded by the International Transport Forum and has grown to be an important resource for comparing road safety metrics between countries worldwide, although mostly in the developed world. Every year, the ITF publishes comparative and country-by-country road safety data gathered for the IRTAD database and analysed by the IRTAD Group in the ITF Road Safety Annual Report, informally known as "IRTAD Report". Over the years, the IRTAD acronym has come to stand not only for the database, but also for the Traffic Safety Data and Analysis Group (usually referred to as IRTAD Group). The IRTAD Group is the International Transport Forum's permanent working group on road safety. It consists of a group of international road safety experts drawn from national road administrations, road safety research institutes, International organizations, automobile associations, insurance companies, car manufacturers and other road safety stakeholders. The IRTAD Group is a major forum for international road safety collaboration and exchange of best practices. Its focus is on improving road safety data as a basis for targeting interventions that are effective in reducing the number of road deaths and serious traffic injuries. The work of IRTAD, among that of others, has spawned the creation of road safety observatories for different world regions: the Ibero-American Road Safety Observatory Archived 2020-06-28 at the Wayback Machine (OISEVI), the African Road Safety Observatory Archived 2020-06-10 at the Wayback Machine, and the South-East Asian Road Safety Observatory. The ITF supports OISEVI through the Spanish-language IRTAD-LAC database and is actively involved in the implementation of the African and South East-Asian observatories. The genesis of the road safety observatory movement dates back to 2008, when the ITF, via IRTAD, began to facilitate twinning between countries striving to improve their road safety record and countries with high road safety performance. The initial twinning was between Jamaica and the United Kingdom. This work was supported by the World Bank, the Inter-American Development Bank (IADB) and the FIA Foundation. The twinning between Argentina and Spain in 2011 led to the creation of OISEVI. To this day, the ITF supports OISEVI through the Spanish-language IRTAD-LAC database. In 2006, the ITF set up Safer City Streets, a global traffic safety network for cities that replicates the successful IRTAD approach for urban road safety.

    Read more →
  • Vans challenge

    Vans challenge

    The Vans challenge is a viral internet challenge that began in March 2019 where people show their Vans shoes landing right-side up after tossing them in the air. The viral sensation reportedly started after a Twitter user shared a video of the occurrence, which was captioned: “Did you know it doesn’t matter how you throw your Vans they will land facing up.” Since then, multiple people on social media posted similar videos of them throwing their Vans in the air and landing right-side up, along with Crocs, UGG boots, and other popular shoes. This theory proved false, as these shoes have not always landed facing up after tossing them.

    Read more →
  • Software token

    Software token

    A software token (a.k.a. soft token) is a piece of a two-factor authentication security device that may be used to authorize the use of computer services. Software tokens are stored on a general-purpose electronic device such as a desktop computer, laptop, PDA, or mobile phone and can be duplicated. (Contrast hardware tokens, where the credentials are stored on a dedicated hardware device and therefore cannot be duplicated — absent physical invasion of the device) Because software tokens are something one does not physically possess, they are exposed to unique threats based on duplication of the underlying cryptographic material - for example, computer viruses and software attacks. Both hardware and software tokens are vulnerable to bot-based man-in-the-middle attacks, or to simple phishing attacks in which the one-time password provided by the token is solicited, and then supplied to the genuine website in a timely manner. Software tokens do have benefits: there is no physical token to carry, they do not contain batteries that will run out, and they are cheaper than hardware tokens. == Security architecture == There are two primary architectures for software tokens: shared secret and public-key cryptography. For a shared secret, an administrator will typically generate a configuration file for each end-user. The file will contain a username, a personal identification number, and the secret. This configuration file is given to the user. The shared secret architecture is potentially vulnerable in a number of areas. The configuration file can be compromised if it is stolen and the token is copied. With time-based software tokens, it is possible to borrow an individual's PDA or laptop, set the clock forward, and generate codes that will be valid in the future. Any software token that uses shared secrets and stores the PIN alongside the shared secret in a software client can be stolen and subjected to offline attacks. Shared secret tokens can be difficult to distribute, since each token is essentially a different piece of software. Each user must receive a copy of the secret, which can create time constraints. Some newer software tokens rely on public-key cryptography, or asymmetric cryptography. This architecture eliminates some of the traditional weaknesses of software tokens, but does not affect their primary weakness (ability to duplicate). A PIN can be stored on a remote authentication server instead of with the token client, making a stolen software token no good unless the PIN is known as well. However, in the case of a virus infection, the cryptographic material can be duplicated and then the PIN can be captured (via keylogging or similar) the next time the user authenticates. If there are attempts made to guess the PIN, it can be detected and logged on the authentication server, which can disable the token. Using asymmetric cryptography also simplifies implementation, since the token client can generate its own key pair and exchange public keys with the server.

    Read more →
  • Localhost

    Localhost

    In computer networking, localhost is a hostname that refers to the current computer used to access it. The name localhost is reserved for loopback purposes. It is used to access the network services that are running on the host via the loopback network interface. Using the loopback interface bypasses any local network interface hardware. == Loopback == The local loopback mechanism may be used to run a network service on a host without requiring a physical network interface, or without making the service accessible from the networks the computer may be connected to. For example, a locally installed website may be accessed from a Web browser by the URL http://localhost to display its home page. IPv4 network standards reserve the entire address block 127.0.0.0/8 (more than 16 million addresses) for loopback purposes. That means any packet sent to any of those addresses is looped back. The address 127.0.0.1 is the standard address for IPv4 loopback traffic; the rest are not supported by all operating systems. However, they can be used to set up multiple server applications on the host, all listening on the same port number. In the IPv6 addressing architecture there is only a single address assigned for loopback: ::1. The standard precludes the assignment of that address to any physical interface, as well as its use as the source or destination address in any packet sent to remote hosts. == Name resolution == The name localhost normally resolves to the IPv4 loopback address 127.0.0.1, and to the IPv6 loopback address ::1. This resolution is normally configured by the following lines in the operating system's hosts file: 127.0.0.1 localhost ::1 localhost The name may also be resolved by Domain Name System (DNS) servers, but there are special considerations governing the use of this name: An IPv4 or IPv6 address query for the name localhost must always resolve to the respective loopback address. Applications may resolve the name to a loopback address themselves, or pass it to the local name resolver mechanisms. When a name resolver receives an address (A or AAAA) query for localhost, it should return the appropriate loopback addresses, and negative responses for any other requested record types. Queries for localhost should not be sent to caching name servers. To avoid burdening the Domain Name System root servers with traffic, caching name servers should never request name server records for localhost, or forward resolution to authoritative name servers. When authoritative name servers receive queries for 'localhost' in spite of the provisions mentioned above, they should resolve them appropriately. In addition to the mapping of localhost to the loopback addresses (127.0.0.1 and ::1), localhost may also be mapped to other IPv4 (loopback) addresses and it is also possible to assign other, or additional, names to any loopback address. The mapping of localhost to addresses other than the designated loopback address range in the hosts file or in DNS is not guaranteed to have the desired effect, as applications may map the name internally. In the Domain Name System, the name .localhost is reserved as a top-level domain name, originally set aside to avoid confusion with the hostname localhost. Domain name registrars are precluded from delegating domain names in the top-level .localhost domain. == Historical notes == In 1981, the block 127.0.0.0/8 got a 'reserved' status, as not to assign it as a general purpose class A IP network. This block was officially assigned for loopback purposes in 1986. Its purpose as a Special Use IPv4 Address block was confirmed in 1994,, 2002, 2010,, and last in 2013. From the outset, in 1995, the single IPv6 loopback address ::1 was defined. Its purpose and definition was unchanged in 1998,, 2003,, and up to the current definition, in 2006. == Packet processing == The processing of any packet sent to a loopback address, is implemented in the link layer of the TCP/IP stack. Such packets are never passed to any network interface controller (NIC) or hardware device driver and must not appear outside of a computing system, or be routed by any router. This permits software testing and local services, even in the absence of any hardware network interfaces. Looped-back packets are distinguished from any other packets traversing the TCP/IP stack only by the special IP address they were addressed to. Thus, the services that ultimately receive them respond according to the specified destination. For example, an HTTP service could route packets addressed to 127.0.0.99:80 and 127.0.0.100:80 to different Web servers, or to a single server that returns different web pages. To simplify such testing, the hosts file may be configured to provide appropriate names for each address. Packets received on a non-loopback interface with a loopback source or destination address must be dropped. Such packets are sometimes referred to as Martian packets. As with any other bogus packets, they may be malicious and any problems they might cause can be avoided by applying bogon filtering. == Special cases == The releases of the MySQL database differentiate between the use of the hostname localhost and the use of the addresses 127.0.0.1 and ::1. When using localhost as the destination in a client connector interface of an application, the MySQL application programming interface connects to the database using a Unix domain socket, while a TCP connection via the loopback interface requires the direct use of the explicit address. One notable exception to the use of the 127.0.0.0/8 addresses is their use in Multiprotocol Label Switching (MPLS) traceroute error detection, in which their property of not being routable provides a convenient means to avoid delivery of faulty packets to end users.

    Read more →
  • Artisse AI

    Artisse AI

    Artisse AI is a Hong Kong-based technology company founded by William Wu. The company developed a mobile photography application using generative artificial intelligence to transform selfies into high-quality, personalized images. The app allows users to visualize themselves in various scenarios, outfits, and hairstyles, and they can adjust lighting and ambiance to match their preferences. The app launched in 2023 across multiple markets, including the United States, United Kingdom, Japan, South Korea, Canada, and Australia. By January 2024, users had generated over 5 million images. That same month, the company secured $6.7 million in seed funding to support product development and marketing. == History == Artisse was originally founded in South Korea in 2022 by William Wu. The early concept was connected to a virtual idol initiative developed in collaboration with a K-pop agency, intended to support Wu's blockchain gaming business. The project later evolved into a standalone AI photography application. The current version of the Artisse app was developed following the company's relocation to Hong Kong in 2022. In January 2024, Artisse secured $6.7 million in seed funding, led by The London Fund. The investment was aimed at supporting product development, marketing, and user acquisition. Artisse uses an AI algorithm to create hyperrealistic images from uploaded photos. The app generates personalized images by combining generative AI technology, a global pool of licensed talent, and finished art services. The app works with individual users and businesses, offering professional-grade photos and advertisement images. According to the British newspaper Evening Standard the company has developed the world's first and most advanced AI photographer. It captures 15-30 photos of the user and generates 2D images, placing them in various outfits and locations worldwide. === Catheron Gaming === Artisse AI originated from Catheon Gaming, a blockchain gaming and entertainment company founded in 2021 by William Wu. Catheon Gaming published more than 30 Web3 titles in its first year, developed a blockchain game distribution platform, and offered advisory services to external developers. In 2022, HSBC and KPMG listed Catheon Gaming among the "Top 10 Emerging Giants" in the Asia–Pacific region, selected from a pool of more than 6,000 startups. In June 2023, Catheon Gaming was rebranded as Artisse Interactive, creating two divisions: Artisse Gaming, which continued blockchain and Web3 game development, and Artisse AI, which focused on generative photography technology. == Technology == Artisse uses a proprietary generative AI model combined with open-source imaging frameworks and diffusion models. Users are prompted to upload between 15 and 30 personal images, allowing the AI to train a personalized model in 30 to 40 minutes. After training, the app generates new images based on either textual or visual prompts, with options to adjust elements such as clothing, hairstyles, lighting, and backgrounds. To enhance realism, the app integrates augmented reality features and image refinement tools. The company has introduced features to address representation issues related to body shape and skin tone, although concerns persist about the ethical implications of altering personal traits. == Products == === Artisse mobile app === Available on iOS and Android platforms in 35 languages. Users initially receive 25 free images, after which the app adopts a subscription pricing model ranging from approximately $6 to $30 per month. By early 2024, the app reported around 4,000 paying subscribers out of more than 200,000 downloads. === Business and enterprise services === Artisse provides B2B solutions for creating marketing imagery and partners with agencies like Iconic Management to enable cost-effective virtual photoshoots. Additional features in development include virtual try-on capabilities and augmented reality integration for fashion retail. == Reception == Media coverage has noted the app's photorealistic image outputs with some sources highlighting its ease of use. However, concerns have been raised regarding image authenticity, algorithmic biases, and the potential impact on professional photography and modeling. Artisse has been widely covered by media outlets including TechCrunch, PetaPixel, Forbes Australia, and The Evening Standard. These publications discussed the app's integration of generative AI technology within the consumer photography space, its growing market influence, and its rapid adoption by users worldwide.

    Read more →
  • Manufacturing Automation Protocol

    Manufacturing Automation Protocol

    Manufacturing Automation Protocol (MAP) was a computer network standard released in 1982 for interconnection of devices from multiple manufacturers. It was developed by General Motors to combat the proliferation of incompatible communications standards used by suppliers of automation products such as programmable controllers. By 1985 demonstrations of interoperability were carried out and 21 vendors offered MAP products. In 1986 the Boeing corporation merged its Technical Office Protocol with the MAP standard, and the combined standard was referred to as "MAP/TOP". The standard was revised several times between the first issue in 1982 and MAP 3.0 in 1987, with significant technical changes that made interoperation between different revisions of the standard difficult. Although promoted and used by manufacturers such as General Motors, Boeing, and others, it lost market share to the contemporary Ethernet standard and was not widely adopted. Difficulties included changing protocol specifications, the expense of MAP interface links, and the speed penalty of a token-passing network. The token bus network protocol used by MAP became standardized as IEEE standard 802.4 but this committee disbanded in 2004 due to lack of industry attention.

    Read more →
  • Reverse proxy

    Reverse proxy

    In computer networks, a reverse proxy or surrogate server is a proxy server that appears to any client to be an ordinary web server, but in reality merely acts as an intermediary that forwards the client's requests to one or more ordinary web servers. Reverse proxies help increase scalability, performance, resilience, and security, but they also carry a number of risks. Companies that run web servers often set up reverse proxies to facilitate the communication between an Internet user's browser and the web servers. An important advantage of doing so is that the web servers can be hidden behind a firewall on a company-internal network, and only the reverse proxy needs to be directly exposed to the Internet. Reverse proxy servers are implemented in popular open-source web servers. Dedicated reverse proxy servers are used by some of the biggest websites on the Internet. A reverse proxy is capable of tracking IP addresses of requests that are relayed through it as well as reading and/or modifying any non-encrypted traffic. However, this implies that anyone who has compromised the server could do so as well. Reverse proxies differ from forward proxies, which are used when the client is restricted to a private, internal network and asks a forward proxy to retrieve resources from the public Internet. == Uses == Large websites and content delivery networks use reverse proxies, together with other techniques, to balance the load between internal servers. Reverse proxies can keep a cache of static content, which further reduces the load on these internal servers and the internal network. It is also common for reverse proxies to add features such as compression or TLS encryption to the communication channel between the client and the reverse proxy. Reverse proxies can inspect HTTP headers, which, for example, allows them to present a single IP address to the Internet while relaying requests to different internal servers based on the URL of the HTTP request. Reverse proxies can hide the existence and characteristics of origin servers. This can make it more difficult to determine the actual location of the origin server / website and, for instance, more challenging to initiate legal action such as takedowns or block access to the website, as the IP address of the website may not be immediately apparent. Additionally, the reverse proxy may be located in a different jurisdiction with different legal requirements, further complicating the takedown process. Application firewall features can protect against common web-based attacks, like a denial-of-service attack (DoS) or distributed denial-of-service attacks (DDoS). Without a reverse proxy, removing malware or initiating takedowns (while simultaneously dealing with the attack) on one's own site, for example, can be difficult. In the case of secure websites, a web server may not perform TLS encryption itself, but instead offload the task to a reverse proxy that may be equipped with TLS acceleration hardware. (See TLS termination proxy.) A reverse proxy can distribute the load from incoming requests to several servers, with each server supporting its own application area. In the case of reverse proxying web servers, the reverse proxy may have to rewrite the URL in each incoming request in order to match the relevant internal location of the requested resource. A reverse proxy can reduce load on its origin servers by caching static content and dynamic content, known as web acceleration. Proxy caches of this sort can often satisfy a considerable number of website requests, greatly reducing the load on the origin server(s). A reverse proxy can optimize content by compressing it in order to speed up loading times. In a technique named "spoon-feeding", a dynamically generated page can be produced in its entirety and served to the reverse proxy, which can feed the page to the client as the connection allows. The program that generates the page need not remain open, thus releasing server resources during the possibly extended time the client requires to complete the transfer. Reverse proxies can operate wherever multiple web-servers must be accessible via a single public IP address. The web servers listen on different ports in the same machine, with the same local IP address or, possibly, on different machines with different local IP addresses. The reverse proxy analyzes each incoming request and delivers it to the right server within the local area network. Reverse proxies can perform A/B testing and multivariate testing without requiring application code to handle the logic of which version is served to a client. A reverse proxy can add access authentication to a web server that does not have any authentication. == Risks == When the transit traffic is encrypted and the reverse proxy needs to filter/cache/compress or otherwise modify or improve the traffic, the proxy first must decrypt and re-encrypt communications. This requires the proxy to possess the TLS certificate and its corresponding private key, extending the number of systems that can have access to non-encrypted data and making it a more valuable target for attackers. The vast majority of external data breaches happen either when hackers succeed in abusing an existing reverse proxy that was intentionally deployed by an organization, or when hackers succeed in converting an existing Internet-facing server into a reverse proxy server. Compromised or converted systems allow external attackers to specify where they want their attacks proxied to, enabling their access to internal networks and systems. Applications that were developed for the internal use of a company are not typically hardened to public standards and are not necessarily designed to withstand all hacking attempts. When an organization allows external access to such internal applications via a reverse proxy, they might unintentionally increase their own attack surface and invite hackers. If a reverse proxy is not configured to filter attacks or it does not receive daily updates to keep its attack signature database up to date, a zero-day vulnerability can pass through unfiltered, enabling attackers to gain control of the system(s) that are behind the reverse proxy server. Giving the reverse proxy of a third party access to private keys (for caching or optimizing content) places the entire triad of confidentiality, integrity and availability in the hands of the third party who operates the proxy. A reverse proxy is a single point of failure for the back-end services it fronts: an outage caused by misconfiguration, a denial-of-service attack, or a software fault can make every fronted service unreachable to outside clients, even when the back-end services themselves remain healthy. For example, a 2020 outage at Cloudflare briefly took down major sites and services that relied on its reverse-proxy edge, including Discord.

    Read more →
  • Trust federation

    Trust federation

    A trust federation is part of the evolving Identity Metasystem that will bring a new layer of persistent identity and trusted data sharing to the Internet. Although the concept of trust federations is technology neutral, several protocols like SAML, OpenID, Information Card, XDI can handle the challenges of technical interoperability. The challenge of business and social interoperability requires a new type of cooperative association similar to a credit card association. Instead of banks, however, a trust federation is an alliance of i-brokers and their customers who agree to abide by a common set of agreements in the care and handling of customer data. A model for trust federations is offered by Open Identity Exchange and Kantara Initiative, which is applied in the U.S. Government ICAM Trust Framework. Some operational trust federations are: InCommon (academic, USA) REFEDs (Research and Education Federations, Europe) IGTF Interoperable Global Trust Federation Portalverbund Government Portal Federation, Austria Trust federations are not limited to the social web use case, but apply to all federations where trust in identity and compliance to other objectives of information security such as confidentiality, integrity and privacy is brokered.

    Read more →
  • Application permissions

    Application permissions

    Permissions are a means of controlling and regulating access to specific system- and device-level functions by software. Typically, types of permissions cover functions that may have privacy implications, such as the ability to access a device's hardware features (including the camera and microphone), and personal data (such as storage devices, contacts lists, and the user's present geographical location). Permissions are typically declared in an application's manifest, and certain permissions must be specifically granted at runtime by the user—who may revoke the permission at any time. Permission systems are common on mobile operating systems, where permissions needed by specific apps must be disclosed via the platform's app store. == Mobile devices == On mobile operating systems for smartphones and tablets, typical types of permissions regulate: Access to storage and personal information, such as contacts, calendar appointments, etc. Location tracking. Access to the device's internal camera and/or microphone. Access to biometric sensors, including fingerprint readers and other health sensors.. Internet access. Access to communications interfaces (including their hardware identifiers and signal strength where applicable, and requests to enable them), such as Bluetooth, Wi-Fi, NFC, and others. Making and receiving phone calls. Sending and reading text messages The ability to perform in-app purchases. The ability to "overlay" themselves within other apps. Installing, deleting and otherwise managing applications. Authentication tokens (e.g., OAuth tokens) from web services stored in system storage for sharing between apps. Prior to Android 6.0 "Marshmallow", permissions were automatically granted to apps at runtime, and they were presented upon installation in Google Play Store. Since Marshmallow, certain permissions now require the app to request permission at runtime by the user. These permissions may also be revoked at any time via Android's settings menu. Usage of permissions on Android are sometimes abused by app developers to gather personal information and deliver advertising; in particular, apps for using a phone's camera flash as a flashlight (which have grown largely redundant due to the integration of such functionality at the system level on later versions of Android) have been known to require a large array of unnecessary permissions beyond what is actually needed for the stated functionality. iOS imposes a similar requirement for permissions to be granted at runtime, with particular controls offered for enabling of Bluetooth, Wi-Fi, and location tracking. == WebPermissions == WebPermissions is a permission system for web browsers. When a web application needs some data behind permission, it must request it first. When it does it, a user sees a window asking him to make a choice. The choice is remembered, but can be cleared lately. Currently the following resources are controlled: geolocation desktop notifications service workers sensors audio capturing devices, like sound cards, and their model names and characteristics video capturing devices, like cameras, and their identifiers and characteristics == Analysis == The permission-based access control model assigns access privileges for certain data objects to application. This is a derivative of the discretionary access control model. The access permissions are usually granted in the context of a specific user on a specific device. Permissions are granted permanently with few automatic restrictions. In some cases permissions are implemented in 'all-or-nothing' approach: a user either has to grant all the required permissions to access the application or the user can not access the application. There is still a lack of transparency when the permission is used by a program or application to access the data protected by the permission access control mechanism. Even if a user can revoke a permission, the app can blackmail a user by refusing to operate, for example by just crashing or asking user to grant the permission again in order to access the application. The permission mechanism has been widely criticized by researchers for several reasons, including; Intransparency of personal data extraction and surveillance, including the creation of a false sense of security; End-user fatigue of micro-managing access permissions leading to a fatalistic acceptance of surveillance and intransparency; Massive data extraction and personal surveillance carried out once the permissions are granted. Some apps, such as XPrivacy and Mockdroid spoof data in order to act as a measure for privacy. Further transparency methods include longitudinal behavioural profiling and multiple-source privacy analysis of app data access.

    Read more →
  • Bitmap index

    Bitmap index

    A bitmap index is a special kind of database index that uses bitmaps. Bitmap indexes have traditionally been considered to work well for low-cardinality columns, which have a modest number of distinct values, either absolutely, or relative to the number of records that contain the data. The extreme case of low cardinality is Boolean data (e.g., does a resident in a city have internet access?), which has two values, True and False. Bitmap indexes use bit arrays (commonly called bitmaps) and answer queries by performing bitwise logical operations on these bitmaps. Bitmap indexes have a significant space and performance advantage over other structures for query of such data. Their drawback is they are less efficient than the traditional B-tree indexes for columns whose data is frequently updated: consequently, they are more often employed in read-only systems that are specialized for fast query - e.g., data warehouses, and generally unsuitable for online transaction processing applications. Some researchers argue that bitmap indexes are also useful for moderate or even high-cardinality data (e.g., unique-valued data) which is accessed in a read-only manner, and queries access multiple bitmap-indexed columns using the AND, OR or XOR operators extensively. Bitmap indexes are also useful in data warehousing applications for joining a large fact table to smaller dimension tables such as those arranged in a star schema. == Example == Continuing the internet access example, a bitmap index may be logically viewed as follows: On the left, Identifier refers to the unique number assigned to each resident, HasInternet is the data to be indexed, the content of the bitmap index is shown as two columns under the heading bitmaps. Each column in the left illustration under the Bitmaps header is a bitmap in the bitmap index. In this case, there are two such bitmaps, one for "has internet" Yes and one for "has internet" No. It is easy to see that each bit in bitmap Y shows whether a particular row refers to a person who has internet access. This is the simplest form of bitmap index. Most columns will have more distinct values. For example, the sales amount is likely to have a much larger number of distinct values. Variations on the bitmap index can effectively index this data as well. We briefly review three such variations. Note: Many of the references cited here are reviewed at (John Wu (2007)). For those who might be interested in experimenting with some of the ideas mentioned here, many of them are implemented in open source software such as FastBit, the Lemur Bitmap Index C++ Library, the Roaring Bitmap Java library and the Apache Hive Data Warehouse system. == Compression == For historical reasons, bitmap compression and inverted list compression were developed as separate lines of research, and only later were recognized as solving essentially the same problem. Software can compress each bitmap in a bitmap index to save space. There has been considerable amount of work on this subject. Though there are exceptions such as Roaring bitmaps, Bitmap compression algorithms typically employ run-length encoding, such as the Byte-aligned Bitmap Code, the Word-Aligned Hybrid code, the Partitioned Word-Aligned Hybrid (PWAH) compression, the Position List Word Aligned Hybrid, the Compressed Adaptive Index (COMPAX), Enhanced Word-Aligned Hybrid (EWAH) and the COmpressed 'N' Composable Integer SEt (CONCISE). These compression methods require very little effort to compress and decompress. More importantly, bitmaps compressed with BBC, WAH, COMPAX, PLWAH, EWAH and CONCISE can directly participate in bitwise operations without decompression. This gives them considerable advantages over generic compression techniques such as LZ77. BBC compression and its derivatives are used in a commercial database management system. BBC is effective in both reducing index sizes and maintaining query performance. BBC encodes the bitmaps in bytes, while WAH encodes in words, better matching current CPUs. "On both synthetic data and real application data, the new word aligned schemes use only 50% more space, but perform logical operations on compressed data 12 times faster than BBC." PLWAH bitmaps were reported to take 50% of the storage space consumed by WAH bitmaps and offer up to 20% faster performance on logical operations. Similar considerations can be done for CONCISE and Enhanced Word-Aligned Hybrid. The performance of schemes such as BBC, WAH, PLWAH, EWAH, COMPAX and CONCISE is dependent on the order of the rows. A simple lexicographical sort can divide the index size by 9 and make indexes several times faster. The larger the table, the more important it is to sort the rows. Reshuffling techniques have also been proposed to achieve the same results of sorting when indexing streaming data. == Encoding == Basic bitmap indexes use one bitmap for each distinct value. It is possible to reduce the number of bitmaps used by using a different encoding method. For example, it is possible to encode C distinct values using log(C) bitmaps with binary encoding. This reduces the number of bitmaps, further saving space, but to answer any query, most of the bitmaps have to be accessed. This makes it potentially not as effective as scanning a vertical projection of the base data, also known as a materialized view or projection index. Finding the optimal encoding method that balances (arbitrary) query performance, index size and index maintenance remains a challenge. Without considering compression, Chan and Ioannidis analyzed a class of multi-component encoding methods and came to the conclusion that two-component encoding sits at the kink of the performance vs. index size curve and therefore represents the best trade-off between index size and query performance. == Binning == For high-cardinality columns, it is useful to bin the values, where each bin covers multiple values and build the bitmaps to represent the values in each bin. This approach reduces the number of bitmaps used regardless of encoding method. However, binned indexes can only answer some queries without examining the base data. For example, if a bin covers the range from 0.1 to 0.2, then when the user asks for all values less than 0.15, all rows that fall in the bin are possible hits and have to be checked to verify whether they are actually less than 0.15. The process of checking the base data is known as the candidate check. In most cases, the time used by the candidate check is significantly longer than the time needed to work with the bitmap index. Therefore, binned indexes exhibit irregular performance. They can be very fast for some queries, but much slower if the query does not exactly match a bin. == History == The concept of bitmap index was first introduced by Professor Israel Spiegler and Rafi Maayan in their research "Storage and Retrieval Considerations of Binary Data Bases", published in 1985. The first commercial database product to implement a bitmap index was Computer Corporation of America's Model 204. Patrick O'Neil published a paper about this implementation in 1987. This implementation is a hybrid between the basic bitmap index (without compression) and the list of Row Identifiers (RID-list). Overall, the index is organized as a B+tree. When the column cardinality is low, each leaf node of the B-tree would contain long list of RIDs. In this case, it requires less space to represent the RID-lists as bitmaps. Since each bitmap represents one distinct value, this is the basic bitmap index. As the column cardinality increases, each bitmap becomes sparse and it may take more disk space to store the bitmaps than to store the same content as RID-lists. In this case, it switches to use the RID-lists, which makes it a B+tree index. == In-memory bitmaps == One of the strongest reasons for using bitmap indexes is that the intermediate results produced from them are also bitmaps and can be efficiently reused in further operations to answer more complex queries. Many programming languages support this as a bit array data structure. For example, Java has the BitSet class and .NET have the BitArray class. Some database systems that do not offer persistent bitmap indexes use bitmaps internally to speed up query processing. For example, PostgreSQL versions 8.1 and later implement a "bitmap index scan" optimization to speed up arbitrarily complex logical operations between available indexes on a single table. For tables with many columns, the total number of distinct indexes to satisfy all possible queries (with equality filtering conditions on either of the fields) grows very fast, being defined by this formula: C n [ n 2 ] ≡ n ! ( n − [ n 2 ] ) ! [ n 2 ] ! {\displaystyle \mathbf {C} _{n}^{\left[{\frac {n}{2}}\right]}\equiv {\frac {n!}{\left(n-\left[{\frac {n}{2}}\right]\right)!\left[{\frac {n}{2}}\right]!}}} . A bitmap index scan combines expressions on different indexes, thus requiring only one index per column t

    Read more →
  • Social media measurement

    Social media measurement

    Social media measurement, also called social media controlling, is the management practice of evaluating successful social media communications of brands, companies, or other organizations. Key performance indicators may be measured by extracting information from social media channels, such as blogs, wikis, micro-blogs such as Twitter, social networking sites, or video/photo sharing websites, forums from time to time. It is also used by companies to gauge current trends in the industry. The process first gathers data from different websites and then performs analysis based on different metrics like time spent on the page, click through rate, content share, comments, text analytics to identify positive or negative emotions about the brand. Some other social media metrics include share of voice, owned mentions, and earned mentions. The social media measurement process starts with defining a goal that needs to be achieved and defining the expected outcome of the process. The expected outcome varies per the goal and is usually measured by a variety of metrics. This is followed by defining possible social strategies to be used to achieve the goal. Then the next step is designing strategies to be used and setting up configuration tools that ease the process of collecting the data. In the next step, strategies and tools are deployed in real-time. This step involves conducting Quality Assurance tests of the methods deployed to collect the data. And in the final step, data collected from the system is analyzed and if the need arises, it is refined on the run time to enhance the methodologies used. The last step ensures that the result obtained is more aligned with the goal defined in the first step. == Data Acquisition == Acquiring data from social media is in demand of an exploring the user participation and population with the purpose of retrieving and collecting so many kinds of data(ex: comments, downloads etc.). There are several prevalent techniques to acquire data such as Network traffic analysis, Ad-hoc application and Crawling Network Traffic Analysis - Network traffic analysis is the process of capturing network traffic and observing it closely to determine what is happening in the network. It is primarily done to improve the performance, security and other general management of the network. However concerned about the potential tort of privacy on the Internet, network traffic analysis is always restricted by the government. Furthermore, high-speed links are not adaptable to traffic analysis because of the possible overload problem according to the packet sniffing mechanism Ad-hoc Application - Ad-hoc application is a kind of application that provides services and games to social network users by developing the APIs offered by social network companies (Facebook Developer Platform). The infrastructure of Ad-hoc application allows the user to interact with the interface layer instead of the application servers. The API provides a path for application to access information after the user login. Moreover, the size of the data set collected vary with the popularity of the social media platform i.e. social media platforms having high number of users will have more data than platforms having less user base. Scraping is a process in which the APIs collect online data from social media. The data collected from Scraping is in raw format. However, having access to these types of data is a bit difficult because of its commercial value. Crawling - Crawling is a process in which a web crawler creates indexes of all the words in a web-page, stores them, then follows all the hyperlinks and indexes on that page and again stores them. It is the most popular technique for data acquisition and is also well known for its easy operation based on prevalent Object-Orientated Programming Language (Java or Python etc.). And most important, social network companies (YouTube, Flicker, Facebook, Instagram, etc.) are friendly to crawling techniques by providing public APIs == Applications == === For branding === Monitoring social media allows researchers to find insights into a brand's overall visibility on social media, to measure the impact of campaigns, to identify opportunities for engagement, to assess competitor activity and share of voice, and to detect impending crises. It can also provide valuable information about emerging trends and what consumers and clients think about specific topics, brands or products. This is the work of a cross-section of groups that include market researchers, PR staff, marketing teams, social-engagement, and community staff, agencies and sales teams. Several different providers have developed tools to facilitate the monitoring of a variety of social media channels - from blogging to internet video to internet forums. This allows companies to track what consumers say about their brands and actions. Companies can then react to these conversations and interact with consumers through social media platforms. === In government === Apart from commercial applications, social media monitoring has become a pervasive technique applied by public organizations and governments. Monitoring is a tradition within the public sector, and social-media monitoring provides a real-time approach to detecting and responding to social developments. Governments have come to realize the need for strategies to cope with surprises from the rapid expansion of public issues. Sobkowicz introduced a framework with three blocks of social-media opinion tracking, simulating and forecasting. It includes: real-time detection of emotions, topics and opinions information-flow modelling and agent-based simulation modeling of opinion networks Bekkers introduced the application of social media monitoring in the Netherlands. Public organizations in the Netherlands (such as the Tax Agency and the Education Ministry) have started to use social media monitoring to obtain better insights into the sentiments of target groups. On the one hand, the public sector will be enabled to provide timely and efficient answers to the public by using social media monitoring techniques, but on the other hand, they also have to deal with concerns about ethical issues such as transparency and privacy. == Quantifying social media == Social media management software (SMMS) is an application program or software that facilitates an organization's ability to successfully engage in social media across different communication channels. SMMS is used to monitor inbound and outbound conversations, support customer interaction, audit or document social marketing initiatives and evaluate the usefulness of a social media presence. It can be difficult to measure all social media conversations. Due to privacy settings and other issues, not all social media conversations can be found and reported by monitoring tools. However, whilst social media monitoring cannot give absolute figures, it can be extremely useful for identifying trends and for benchmarking, in addition to the uses mentioned above. These findings can, in turn, influence and shape future business decisions. In order to access social media data (posts, Tweets, and meta-data) and to analyze and monitor social media, many companies use software technologies built for business. These range from in-platform analytics dashboards to dedicated third-party platforms, which offer more advanced capabilities including cross-platform audience intelligence, sentiment analysis, and trend detection at scale. == Location-based == Most social media networks allow users to add a location to their posts (reference all of our feeds). The location can be classified as either 'at-the-location' or 'about-the-location'. "'At-the-location' services can be defined as services where location-based content is created at the geographic location. 'About-the-location' services can be defined as services which are referring to a particular location but the content is not necessarily created in this particular physical place." The added information available from geotagged (link to Geotagging article) posts means that they can be displayed on a map. This means that a location can be used as the start of a social media search rather than a keyword or hashtag. This has major implications for disaster relief, event monitoring, safety and security professionals since a large portion of their job is related to tracking and monitoring specific locations. == Technologies used == Various monitoring platforms use different technologies for social media monitoring and measurement. These technology providers may connect to the API provided by social platforms that are created for 3rd party developers to develop their own applications and services that access data. Facebook's Graph API is one such API that social media monitoring solution products would connect to pull data from. Some social media monitoring and analytics companies use calls to data providers each time an end-user d

    Read more →