AI Code Janitor

AI Code Janitor — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Scale-space axioms

    Scale-space axioms

    In image processing and computer vision, a scale space framework can be used to represent an image as a family of gradually smoothed images. This framework is very general and a variety of scale space representations exist. A typical approach for choosing a particular type of scale space representation is to establish a set of scale-space axioms, describing basic properties of the desired scale-space representation and often chosen so as to make the representation useful in practical applications. Once established, the axioms narrow the possible scale-space representations to a smaller class, typically with only a few free parameters. A set of standard scale space axioms, discussed below, leads to the linear Gaussian scale-space, which is the most common type of scale space used in image processing and computer vision. == Scale space axioms for the linear scale-space representation == The linear scale space representation L ( x , y , t ) = ( T t f ) ( x , y ) = g ( x , y , t ) ∗ f ( x , y ) {\displaystyle L(x,y,t)=(T_{t}f)(x,y)=g(x,y,t)f(x,y)} of signal f ( x , y ) {\displaystyle f(x,y)} obtained by smoothing with the Gaussian kernel g ( x , y , t ) {\displaystyle g(x,y,t)} satisfies a number of properties 'scale-space axioms' that make it a special form of multi-scale representation: linearity T t ( a f + b h ) = a T t f + b T t h {\displaystyle T_{t}(af+bh)=aT_{t}f+bT_{t}h} where f {\displaystyle f} and h {\displaystyle h} are signals while a {\displaystyle a} and b {\displaystyle b} are constants, shift invariance T t S ( Δ x , Δ y ) f = S ( Δ x , Δ y ) T t f {\displaystyle T_{t}S_{(\Delta x,\Delta _{y})}f=S_{(\Delta x,\Delta _{y})}T_{t}f} where S ( Δ x , Δ y ) {\displaystyle S_{(\Delta x,\Delta _{y})}} denotes the shift (translation) operator ( S ( Δ x , Δ y ) f ) ( x , y ) = f ( x − Δ x , y − Δ y ) {\displaystyle (S_{(\Delta x,\Delta _{y})}f)(x,y)=f(x-\Delta x,y-\Delta y)} semi-group structure g ( x , y , t 1 ) ∗ g ( x , y , t 2 ) = g ( x , y , t 1 + t 2 ) {\displaystyle g(x,y,t_{1})g(x,y,t_{2})=g(x,y,t_{1}+t_{2})} with the associated cascade smoothing property L ( x , y , t 2 ) = g ( x , y , t 2 − t 1 ) ∗ L ( x , y , t 1 ) {\displaystyle L(x,y,t_{2})=g(x,y,t_{2}-t_{1})L(x,y,t_{1})} existence of an infinitesimal generator A {\displaystyle A} ∂ t L ( x , y , t ) = ( A L ) ( x , y , t ) {\displaystyle \partial _{t}L(x,y,t)=(AL)(x,y,t)} non-creation of local extrema (zero-crossings) in one dimension, non-enhancement of local extrema in any number of dimensions ∂ t L ( x , y , t ) ≤ 0 {\displaystyle \partial _{t}L(x,y,t)\leq 0} at spatial maxima and ∂ t L ( x , y , t ) ≥ 0 {\displaystyle \partial _{t}L(x,y,t)\geq 0} at spatial minima, rotational symmetry g ( x , y , t ) = h ( x 2 + y 2 , t ) {\displaystyle g(x,y,t)=h(x^{2}+y^{2},t)} for some function h {\displaystyle h} , scale invariance g ^ ( ω x , ω y , t ) = h ^ ( ω x φ ( t ) , ω x φ ( t ) ) {\displaystyle {\hat {g}}(\omega _{x},\omega _{y},t)={\hat {h}}({\frac {\omega _{x}}{\varphi (t)}},{\frac {\omega _{x}}{\varphi (t)}})} for some functions φ {\displaystyle \varphi } and h ^ {\displaystyle {\hat {h}}} where g ^ {\displaystyle {\hat {g}}} denotes the Fourier transform of g {\displaystyle g} , positivity g ( x , y , t ) ≥ 0 {\displaystyle g(x,y,t)\geq 0} , normalization ∫ x = − ∞ ∞ ∫ y = − ∞ ∞ g ( x , y , t ) d x d y = 1 {\displaystyle \int _{x=-\infty }^{\infty }\int _{y=-\infty }^{\infty }g(x,y,t)\,dx\,dy=1} . In fact, it can be shown that the Gaussian kernel is a unique choice given several different combinations of subsets of these scale-space axioms: most of the axioms (linearity, shift-invariance, semigroup) correspond to scaling being a semigroup of shift-invariant linear operator, which is satisfied by a number of families integral transforms, while "non-creation of local extrema" for one-dimensional signals or "non-enhancement of local extrema" for higher-dimensional signals are the crucial axioms which relate scale-spaces to smoothing (formally, parabolic partial differential equations), and hence select for the Gaussian. The Gaussian kernel is also separable in Cartesian coordinates, i.e. g ( x , y , t ) = g ( x , t ) g ( y , t ) {\displaystyle g(x,y,t)=g(x,t)\,g(y,t)} . Separability is, however, not counted as a scale-space axiom, since it is a coordinate dependent property related to issues of implementation. In addition, the requirement of separability in combination with rotational symmetry per se fixates the smoothing kernel to be a Gaussian. There exists a generalization of the Gaussian scale-space theory to more general affine and spatio-temporal scale-spaces. In addition to variabilities over scale, which original scale-space theory was designed to handle, this generalized scale-space theory also comprises other types of variabilities, including image deformations caused by viewing variations, approximated by local affine transformations, and relative motions between objects in the world and the observer, approximated by local Galilean transformations. In this theory, rotational symmetry is not imposed as a necessary scale-space axiom and is instead replaced by requirements of affine and/or Galilean covariance. The generalized scale-space theory leads to predictions about receptive field profiles in good qualitative agreement with receptive field profiles measured by cell recordings in biological vision. In the computer vision, image processing and signal processing literature there are many other multi-scale approaches, using wavelets and a variety of other kernels, that do not exploit or require the same requirements as scale space descriptions do; please see the article on related multi-scale approaches. There has also been work on discrete scale-space concepts that carry the scale-space properties over to the discrete domain; see the article on scale space implementation for examples and references.

    Read more →
  • NumPy

    NumPy

    NumPy (pronounced NUM-py) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is fiscally sponsored by NumFOCUS. == History == === matrix-sig === The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the special interest group (SIG) matrix-sig was founded with the aim of defining an array computing package; among its members was Python designer and maintainer Guido van Rossum, who extended Python's syntax (in particular the indexing syntax) to make array computing easier. === Numeric === An implementation of a matrix package was completed by Jim Fulton, then expanded to support multi-dimensional arrays by Jim Hugunin and called Numeric (also variously known as the "Numerical Python extensions" or "NumPy"), with influences from the APL family of languages, Basis, MATLAB, FORTRAN, S and S+, and others. Hugunin, a graduate student at the Massachusetts Institute of Technology (MIT), joined the Corporation for National Research Initiatives (CNRI) in 1997 to work on JPython, leaving Paul Dubois of Lawrence Livermore National Laboratory (LLNL) to take over as maintainer. Other early contributors include David Ascher, Konrad Hinsen and Travis Oliphant. === Numarray === A new package called Numarray was written as a more flexible replacement for Numeric. Like Numeric, it too is now deprecated. Numarray had faster operations for large arrays, but was slower than Numeric on small ones, so for a time both packages were used in parallel for different use cases. The last version of Numeric (v24.2) was released on 11 November 2005, while the last version of numarray (v1.5.2) was released on 24 August 2006. There was a desire to get Numeric into the Python standard library, but Guido van Rossum decided that the code was not maintainable in its state then. === NumPy === In early 2005, NumPy developer Travis Oliphant wanted to unify the community around a single array package and ported Numarray's features to Numeric, releasing the result as NumPy 1.0 in 2006. This new project was part of SciPy. To avoid installing the large SciPy package just to get an array object, this new package was separated and called NumPy. Support for Python 3 was added in 2011 with NumPy version 1.5.0. In 2011, PyPy started development on an implementation of the NumPy API for PyPy. As of 2023, it is not yet fully compatible with NumPy. == Features == NumPy targets the CPython reference implementation of Python, which is a non-optimizing bytecode interpreter. Mathematical algorithms written for this version of Python often run much slower than compiled equivalents due to the absence of compiler optimization. NumPy addresses the slowness problem partly by providing multidimensional arrays and functions and operators that operate efficiently on arrays; using these requires rewriting some code, mostly inner loops, using NumPy. Using NumPy in Python gives functionality comparable to MATLAB since they are both interpreted, and they both allow the user to write fast programs as long as most operations work on arrays or matrices instead of scalars. In comparison, MATLAB boasts a large number of additional toolboxes, notably Simulink, whereas NumPy is intrinsically integrated with Python, a more modern and complete programming language. Moreover, complementary Python packages are available; SciPy is a library that adds more MATLAB-like functionality and Matplotlib is a plotting package that provides MATLAB-like plotting functionality. Although MATLAB can perform sparse matrix operations, NumPy alone cannot perform such operations and requires the use of the scipy.sparse library. Internally, both MATLAB and NumPy rely on BLAS and LAPACK for efficient linear algebra computations. Python bindings of the widely used computer vision library OpenCV utilize NumPy arrays to store and operate on data. Since images with multiple channels are simply represented as three-dimensional arrays, indexing, slicing or masking with other arrays are very efficient ways to access specific pixels of an image. The NumPy array as universal data structure in OpenCV for images, extracted feature points, filter kernels and many more vastly simplifies the programming workflow and debugging. Importantly, many NumPy operations release the global interpreter lock, which allows for multithreaded processing. NumPy also provides a C API, which allows Python code to interoperate with external libraries written in low-level languages. === The ndarray data structure === The core functionality of NumPy is its "ndarray", for n-dimensional array, data structure. These arrays are strided views on memory. In contrast to Python's built-in list data structure, these arrays are homogeneously typed: all elements of a single array must be of the same type. Such arrays can also be views into memory buffers allocated by C/C++, Python, and Fortran extensions to the CPython interpreter without the need to copy data around, giving a degree of compatibility with existing numerical libraries. This functionality is exploited by the SciPy package, which wraps a number of such libraries (notably BLAS and LAPACK). NumPy has built-in support for memory-mapped ndarrays. === Limitations === Inserting or appending entries to an array is not as trivially possible as it is with Python's lists. The np.pad(...) routine to extend arrays actually creates new arrays of the desired shape and padding values, copies the given array into the new one and returns it. NumPy's np.concatenate([a1,a2]) operation does not actually link the two arrays but returns a new one, filled with the entries from both given arrays in sequence. Reshaping the dimensionality of an array with np.reshape(...) is only possible as long as the number of elements in the array does not change. These circumstances originate from the fact that NumPy's arrays must be views on contiguous memory buffers. Algorithms that are not expressible as a vectorized operation will typically run slowly because they must be implemented in "pure Python", while vectorization may increase memory complexity of some operations from constant to linear, because temporary arrays must be created that are as large as the inputs. Runtime compilation of numerical code has been implemented by several groups to avoid these problems; open source solutions that interoperate with NumPy include numexpr and Numba. Cython and Pythran are static-compiling alternatives to these. Many modern large-scale scientific computing applications have requirements that exceed the capabilities of the NumPy arrays. For example, NumPy arrays are usually loaded into a computer's memory, which might have insufficient capacity for the analysis of large datasets. Further, NumPy operations are executed on a single CPU. However, many linear algebra operations can be accelerated by executing them on clusters of CPUs or of specialized hardware, such as GPUs and TPUs, which many deep learning applications rely on. As a result, several alternative array implementations have arisen in the scientific python ecosystem over the recent years, such as Dask for distributed arrays and TensorFlow or JAX for computations on GPUs. Because of its popularity, these often implement a subset of NumPy's API or mimic it, so that users can change their array implementation with minimal changes to their code required. A library named CuPy, accelerated by Nvidia's CUDA framework, has also shown potential for faster computing, being a 'drop-in replacement' of NumPy. == Examples == NumPy is conventionally imported as np. === Basic operations === === Universal functions === === Linear algebra === === Multidimensional arrays === === Incorporation with OpenCV === === Nearest-neighbor search === Functional Python and vectorized NumPy version. === F2PY === Quickly wrap native code for faster scripts.

    Read more →
  • Server.com

    Server.com

    Server.com is a domain name that was owned by software as a service (SaaS) company Server Corporation. They offered a suite of services from 1996 until 2007. It was the first SaaS site to offer a variety of services and the first to use the term WebApp to describe its services. It was selected as an Incredibly Useful Site by Yahoo! Internet Life magazine. net magazine listed Server.com among the 100 most influential websites of all time. Server.com launched in 1996 offering the first online personal information manager. In 1997, they rolled out the first threaded message board service; the first web based mailing list manager; one of the first online calendar services; and one of the first online form builders. In 2000, Server.com partnered with NBCi and became server.snap.com until 2001. In 2001, Server.com was serving 100 million monthly pageviews. Media Life declared it one of the 20 biggest ad domains on the Web. In 2002, Server.com developed one of the first web-based RSS aggregators. In 2007, all services were moved to YourWebApps.com. The domain name Server.com was sold in 2009 for $770,000.

    Read more →
  • Cloud printing

    Cloud printing

    There are, in essence, three kinds of Cloud printing. == Benefits == 76% of IT teams have moved, or plan to move, their print workflows to the cloud due to its simplicity. Consumers can print easily to any printer from their PC, tablet or smartphone, while the Cloud print service monitors the supplies level. Many printer vendors such as Lexmark propose an automatic supplies shipment based on the real-time analysis of the printer supplies and user behavior to ensure printing will always be possible. For IT department, Cloud Printing eliminates the need for print servers and represents the only way to print from Cloud virtual desktops and servers. For consumers, cloud ready printers eliminate the need for PC connections and print drivers, enabling them to print from mobile devices. As for publishers and content owners, cloud printing allows them to "avoid the cost and complexity of buying and managing the underlying hardware, software and processes" required for the production of professional print products. Leveraging cloud print for print on demand also allows businesses to cut down on the costs associated with mass production. Moreover, cloud printing can be considered more eco-friendly, as it significantly reduces the amount of paper used (13% reduction in print jobs yearly) and lowers carbon emissions from transportation. As many companies move their IT to the Cloud, some adopting the Windows 365 and Azure Virtual Desktop services from Microsoft, the connection from the Cloud environment to the on-premise printers become an issue as opening ports for incoming print flow traffic is not an option. In 2020, at the exact same time Google discontinued its Google Print offer, Microsoft has announced its Universal Print service offer, aimed at making printing compatible with Cloud Desktop environments, making printing driver-free and simple with no client to install on PC. With Universal Print Microsoft has built a disrupting architecture with a value proposition commodifying printers, removing print servers and drivers, allowing to move printers to VLAN for security purpose and printing from anywhere. Clients are free to use any printer from any model as they all work the same, clients are not tied anymore to any printer brand and that gave a significant boost to the Cloud print market. That Microsoft Universal Print architecture provides APIs to third-party developers who can develop add-ons such as Celiveo 365 to extend Microsoft Cloud Print with added features such as access control on printers and copiers, follow-me pull print, data encryption, advanced usage reporting or charge back. == Providers of Consumer Cloud Printing Solutions == Before 2020 only a handful of providers used to work towards a professional cloud print solution, operating in their own niche or focus on mobile devices. In 2020 Microsoft has boosted that market by announcing its Universal Print Cloud printing service and since then many publishers have started to propose solutions for that growing market. The Covid pandemic also created the need for employees to be able to print at home when using the corporate IT software. Closed VPN often prevent accessing home network printers from corporate laptops and Full Public Cloud solutions are meant to be a solution to that problem. After the decision by Google to terminate Google Cloud Print service on 31 December 2020, most printer vendors released their own mobile cloud solution to fill the gap, while Hewlett-Packard implemented its own cloud print with their ePrint solution. Those solutions are often proprietary, only working on printers proposed by the vendor. Google has decided to let third-party developers develop Cloud Print solutions and to limit its scope to certifying the best Print Management offers compatible with its Chrome Enterprise Cloud ecosystem. == Providers of Corporate Cloud Printing solutions == While many print solutions claim to be "Cloud Printing", there are actually three categories: full Private Cloud, full Public Cloud, and Hybrid Cloud. Their differences are real and have an impact on the overall TCO as the more software there is on-site, the more hidden cost there are. In the Full Public Cloud category, independent SaaS vendors like Celiveo, ezeep , Printix , and Y Soft support a wide range of printer brands and models, allowing clients to buy the best printer without being locked on any brand. They are leveraging cloud computing technology to offer cloud-based print infrastructure and cloud-based printing software as a Service (SaaS). These solutions have integrations to cloud enabled printers or provide embedded printer agents. They feature allow users to print to any printer in any network, isolated network or not, even if that printer is otherwise not reachable from the user's computer. This also allows IT departments to move printers to VLAN for maximum security, like what they are doing with IP phones. Google Chrome Enterprise Cloud ecosystem has its own technical particularities and Google certifies Print Management solutions, ensuring they comply with Google technical requirement, yet letting each solution differentiate from others with specific features or security. Many of solutions for Chrome Enterprise are Hybrid, a few are Full Public Cloud. Industry experts believe that as these services become more popular, users will no longer consider printers as necessary assets but rather as devices that they can access on demand when the need to generate a printed page presents itself. == Caveats of Cloud Printing == == Security == Print jobs flow through Public Internet. It is therefore important to verify no Man-in-the-Middle attack can be performed. The only technical solution is to ensure each printer and PC uses a non-self-generated cryptographic token or certificate allowing TLS mutual authentication and specific data encryption. Self-generated printer certificates are unknown from the Cloud and prevent trusted authentication. Microsoft has implemented its Zero Trust Access security in its Universal Print service, it generates a unique certificate on printers compatible with its service. Other Cloud Printing SaaS providers have followed Microsoft on that High Security path. Print jobs data stored on the Cloud is sensitive as it contains user information as well as all information appearing on pages. Good practices require such data is encrypted at rest and in motion, using asymmetric PKI keys instead of fixed encryption keys. Some solutions require to open incoming traffic ports on the firewall to let Cloud services communicate with printers attached behind that firewall (most of the time for IPP/IPPS flows), some other solutions use a pull model where the communication is always initiated by the printer and no firewall port needs to be open. In terms of security the later is to be preferred.

    Read more →
  • Fyuse

    Fyuse

    Fyuse is a spatial photography app which lets users capture and share interactive 3D images. By tilting or swiping one's smartphone, one can view such "fyuses" from various angles — as if one were walking around an object or subject. The app blends photography and video to create an interactive medium and was first published for iOS in April 2014. The Android version was released at the end of 2014. == The app == Fyuse lets users capture panoramas, selfies, and full 360° views of objects and allows one to view captured moments from different angles. It has its own personal gallery, social network and standalone web integration. With the app, Fyusion also created a social networking platform similar to Instagram. Fyuses can be shared, commented on, liked and re-shared to one's followers (called Echoes). One can build a network of followers and with engagement tracking, one can see how many times an image has been interacted with The images can also be saved for private, offline view, or shared to other social networks, like Facebook or Twitter, or embedded on a website where the images can be interacted with by desktop users via dragging the mouse. Furthermore, in the compass tab other fyuses can be discovered using the app's system of tags and categories. One's Fyuse feed is prepopulated with top users, and one can follow people to see when they post a new fyuse. The app will also find one's friends if one signs up with Facebook or connects it with one's Twitter account. To create a fyuse one moves around a person or object with one's phone's camera in one direction or moving/tilting one's phone around while holding one's finger on the screen. By combining photography and video the app allows one to capture moments that one may not have otherwise been able to capture by recording not one moment in time but stitched together little moments. According to Fyusion CEO Radu Rusu, a photo freezes a moment in time, while a video captures moments in a linear timeline — both still flat, when viewed. A fyuse image captures a moment in space, where one can not only see one side of something, but also around it. When it is done rendering, fyuses can also be edited – one can trim the fyuse for length and edit the brightness, contrast, exposure, saturation and sharpness. One can also add a vignette and apply a filters, with options to adjust their intensity. After editing, one can write a description, add hashtags, and tag parts of the fyuse before one can (voluntarily) publish and share it. Version 1.0 has been described as "alpha prototype" and version 2.0 was released on 17 December 2014. Version 3.0 introduced 3D tagging by which users can layer 3D graphic that animate accordingly with each interaction to add some context to the content. Version 4.0 was released on December 21, 2016 for iOS. Since January 2016 (v3.2) the app allows the export of fyuses as Live Photos. The app has also been described as a more sophisticated version of 3D stickers and flip images. == Applications == The app has many applications for e-commerce such as for fashion designers who want to showcase a garment from every angle, or real estate listings and Airbnb-type sites that want to make their rental properties seem as enticing as possible. The app can also be used for interactive art, 360° panoramas and selfies. == History == San Francisco-based Fyusion Inc.'s three founders — Radu B. Rusu, CTO Stefan Holzer, and VP of Engineering Stephen Miller — worked together at Willow Garage, the robotics research lab started by early Google employee Scott Hassan in the area of "personal robotics" — Hassan decided to turn the lab into more of an incubator, suggesting that the members spin off their technologies into consumer-facing enterprises. Rusu first set out with an open-source 3D perception software startup called Open Perception. Fyusion was officially founded in 2013, and soon after Rusu and his cofounders patented the technology for spatial photography. The company closed a seed funding round at the end of May, raising $3.35 million from investors, including an angel investment from Sun Microsystems cofounder Andreas Bechtolsheim. In 2014 the Fyuse team consisted of 13 employees, mostly engineers and designers, recruited from around the globe. In March 2015 the team displayed their app at Katy Perry's premiere for the movie "Prismatic World Tour on Epix" where Perry also took Fyuse for a test run. == Augmented reality == In September 2016 Fyusion unveiled its platform for creating augmented reality content using ones smartphone. It takes the images from ones smartphone and converts them into 3D holographic images, which one can then view on an AR headset. According to Rusu "by making it easy for people to capture their surroundings on any mobile device, [Fyusion is] revolutionizing the way that people view the world around them" and also states that for "AR to be successful, anyone should be able to create content for it" opposed to the current "small number of content creators and an even smaller number of hardware players". According to him "the applications of [Fyusion's] technology for consumers and businesses are incredibly limitless". The platform uses the company's patented 3D spatio-temporal platform that uses advanced sensor fusion, machine learning and computer vision algorithms and part of the platform is built into the Fyuse app. Before committing to releasing a separate consumer product the company intends to wait until the HoloLens device becomes available to the public. Until then any Fyuse representation created using Fyuse is AR ready and will be able to be shown in HoloLens in the future. == Fyuse - Point of No Return == Fyuse - Point of No Return is a science fiction short advert for Fyuse 3.0 in which Fyuse's digital medium is extrapolated into the future. In the film a woman uses a mini scanning-drone to 3D scan a tree with Fyuse and later recreate it as an augmented reality object at another place.

    Read more →
  • Apache Pig

    Apache Pig

    Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the programming from the Java MapReduce idiom into a notation which makes MapReduce programming high level, similar to that of SQL for relational database management systems. Pig Latin can be extended using user-defined functions (UDFs) which the user can write in Java, Python, JavaScript, Ruby or Groovy and then call directly from the language. == History == Apache Pig was originally developed at Yahoo Research around 2006 for researchers to have an ad hoc way of creating and executing MapReduce jobs on very large data sets. In 2007, it was moved into the Apache Software Foundation. === Naming === Regarding the naming of the Pig programming language, the name was chosen arbitrarily and stuck because it was memorable, easy to spell, and for novelty. The story goes that the researchers working on the project initially referred to it simply as 'the language'. Eventually they needed to call it something. Off the top of his head, one researcher suggested Pig, and the name stuck. It is quirky yet memorable and easy to spell. While some have hinted that the name sounds coy or silly, it has provided us with an entertaining nomenclature, such as Pig Latin for the language, Grunt for the shell, and PiggyBank for the CPAN-like shared repository. == Example == Below is an example of a "Word Count" program in Pig Latin: The above program will generate parallel executable tasks which can be distributed across multiple machines in a Hadoop cluster to count the number of words in a dataset such as all the webpages on the internet. == Pig vs SQL == In comparison to SQL, Pig has a nested relational model, uses lazy evaluation, uses extract, transform, load (ETL), is able to store data at any point during a pipeline, declares execution plans, supports pipeline splits, thus allowing workflows to proceed along DAGs instead of strictly sequential pipelines. On the other hand, it has been argued DBMSs are substantially faster than the MapReduce system once the data is loaded, but that loading the data takes considerably longer in the database systems. It has also been argued RDBMSs offer out of the box support for column-storage, working with compressed data, indexes for efficient random data access, and transaction-level fault tolerance. Pig Latin is procedural and fits very naturally in the pipeline paradigm while SQL is instead declarative. In SQL users can specify that data from two tables must be joined, but not what join implementation to use (You can specify the implementation of JOIN in SQL, thus "... for many SQL applications the query writer may not have enough knowledge of the data or enough expertise to specify an appropriate join algorithm."). Pig Latin allows users to specify an implementation or aspects of an implementation to be used in executing a script in several ways. In effect, Pig Latin programming is similar to specifying a query execution plan, making it easier for programmers to explicitly control the flow of their data processing task. SQL is oriented around queries that produce a single result. SQL handles trees naturally, but has no built in mechanism for splitting a data processing stream and applying different operators to each sub-stream. Pig Latin script describes a directed acyclic graph (DAG) rather than a pipeline. Pig Latin's ability to include user code at any point in the pipeline is useful for pipeline development. If SQL is used, data must first be imported into the database, and then the cleansing and transformation process can begin.

    Read more →
  • Data cube

    Data cube

    In computer programming, a data cube (or datacube) is a multi-dimensional array of values. Typically, the term "data cube" is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data. Even though it is called a cube, a data cube generally is a multi-dimensional concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. The data cube is used to represent data (sometimes called facts) along some dimensions of interest. In satellite image timeseries, dimensions would be latitude and longitude coordinates and time; a fact (sometimes called measure) would be a pixel at a given space and time as taken by the satellite. For example, in online analytical processing, an OLAP cube about a company would have dimensions that could be the company subsidiaries, the company products, and time; in this setup, a fact would be a sales event where a particular product has been sold in a particular subsidiary at a particular time. In any case, every dimension divides data into groups of cells whereas each cell in the cube represents a single measure of interest. Sometimes cubes hold only a few values with the rest being empty, i.e. undefined, while sometimes most or all cube coordinates hold a cell value. In the first case such data are called sparse, and in the second case they are called dense, although there is no hard delineation between the two. Data cubes may be stored in database management systems (DBMS) as part of array DBMS. Spatio-temporal databases and geospatial databases may also be represented as coverage data. == History == Multi-dimensional arrays have long been familiar in programming languages. Fortran offers arbitrarily-indexed 1-D arrays and arrays of arrays, which allows the construction of higher-dimensional arrays, up to 15 dimensions. APL supports n-D arrays with a rich set of operations. All these have in common that arrays must fit into the main memory and are available only while the particular program maintaining them (such as image processing software) is running. A series of data exchange formats support storage and transmission of data cube-like data, often tailored towards particular application domains. Examples include MDX for statistical (in particular, business) data, Zarr and Hierarchical Data Format for general scientific data, and TIFF for imagery. In 1992, Peter Baumann introduced management of massive data cubes with high-level user functionality combined with an efficient software architecture. Datacube operations include subset extraction, processing, fusion, and in general queries in the spirit of data manipulation languages like SQL. Some years after, the data cube concept was applied to describe time-varying business data as data cubes by Jim Gray, et al., and by Venky Harinarayan, Anand Rajaraman and Jeff Ullman. Around that time, a working group on Multi-Dimensional Databases ("Arbeitskreis Multi-Dimensionale Datenbanken") was established at German Gesellschaft für Informatik. Datacube Inc. was an image processing company selling hardware and software applications for the PC market in 1996, however without addressing data cubes as such. The EarthServer initiative has established geo data cube service requirements. == Standardization == In 2018, the ISO SQL database language was extended with data cube functionality as "SQL – Part 15: Multi-dimensional arrays (SQL/MDA)". Web Coverage Processing Service is a geo data cube analytics language issued by the Open Geospatial Consortium in 2008. In addition to the common data cube operations, the language knows about the semantics of space and time and supports both regular and irregular grid data cubes, based on the concept of coverage data. An industry standard for querying business data cubes, originally developed by Microsoft, is MultiDimensional eXpressions. == Implementation == Many high-level computer languages treat data cubes and other large arrays as single entities distinct from their contents. These languages, of which Fortran, APL, IDL, NumPy, PDL, and S-Lang are examples, allow the programmer to manipulate complete film clips and other data en masse with simple expressions derived from linear algebra and vector mathematics. Some languages (such as PDL) distinguish between a list of images and a data cube, while many (such as IDL) do not. Array DBMSs (Database Management Systems) offer a data model which generically supports definition, management, retrieval, and manipulation of n-dimensional data cubes. This database category has been pioneered by the rasdaman system since 1994. == Applications == Multi-dimensional arrays can meaningfully represent spatio-temporal sensor, image, and simulation data, but also statistics data where the semantics of dimensions is not necessarily of spatial or temporal nature. Generally, any kind of axis can be combined with any other into a data cube. === Mathematics === In mathematics, a one-dimensional array corresponds to a vector, a two-dimensional array resembles a matrix; more generally, a tensor may be represented as an n-dimensional data cube. === Science and engineering === For a time sequence of color images, the array is generally four-dimensional, with the dimensions representing image X and Y coordinates, time, and RGB (or other color space) color plane. For example, the EarthServer initiative unites data centers from different continents offering 3-D x/y/t satellite image timeseries and 4-D x/y/z/t weather data for retrieval and server-side processing through the Open Geospatial Consortium WCPS geo data cube query language standard. A data cube is also used in the field of imaging spectroscopy, since a spectrally-resolved image is represented as a three-dimensional volume. Earth observation data cubes combine satellite imagery such as Landsat 8 and Sentinel-2 with Geographic information system analytics. === Business intelligence === In online analytical processing (OLAP), data cubes are a common arrangement of business data suitable for analysis from different perspectives through operations like slicing, dicing, pivoting, and aggregation.

    Read more →
  • RCUDA

    RCUDA

    rCUDA, which stands for Remote CUDA, is a type of middleware software framework for remote GPU virtualization. Fully compatible with the CUDA application programming interface (API), it allows the allocation of one or more CUDA-enabled GPUs to a single application. Each GPU can be part of a cluster or running inside of a virtual machine. The approach is aimed at improving performance in GPU clusters that are lacking full utilization. GPU virtualization reduces the number of GPUs needed in a cluster, and in turn, leads to a lower cost configuration – less energy, acquisition, and maintenance. The recommended distributed acceleration architecture is a high performance computing cluster with GPUs attached to only a few of the cluster nodes. When a node without a local GPU executes an application needing GPU resources, remote execution of the kernel is supported by data and code transfers between local system memory and remote GPU memory. rCUDA is designed to accommodate this client-server architecture. On one end, clients employ a library of wrappers to the high-level CUDA Runtime API, and on the other end, there is a network listening service that receives requests on a TCP port. Several nodes running different GPU-accelerated applications can concurrently make use of the whole set of accelerators installed in the cluster. The client forwards the request to one of the servers, which accesses the GPU installed in that computer and executes the request in it. Time-multiplexing the GPU, or in other words sharing it, is accomplished by spawning different server processes for each remote GPU execution request. == rCUDA v20.07 == The rCUDA middleware enables the concurrent usage of CUDA-compatible devices remotely. rCUDA employs either the InfiniBand network or the socket API for the communication between clients and servers. rCUDA can be useful in three different environments: Clusters. To reduce the number of GPUs installed in High Performance Clusters. This leads to energy savings, as well as other related savings like acquisition costs, maintenance, space, cooling, etc. Academia. In commodity networks, to offer access to a few high performance GPUs concurrently to many students. Virtual Machines. To enable the access to the CUDA facilities on the physical machine. The current version of rCUDA (v20.07) supports CUDA version 9.0, excluding graphics interoperability. rCUDA v20.07 targets the Linux OS (for 64-bit architectures) on both client and server sides. CUDA applications do not need any change in their source code in order to be executed with rCUDA.

    Read more →
  • Data cube

    Data cube

    In computer programming, a data cube (or datacube) is a multi-dimensional array of values. Typically, the term "data cube" is applied in contexts where these arrays are massively larger than the hosting computer's main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data. Even though it is called a cube, a data cube generally is a multi-dimensional concept which can be 1-dimensional, 2-dimensional, 3-dimensional, or higher-dimensional. The data cube is used to represent data (sometimes called facts) along some dimensions of interest. In satellite image timeseries, dimensions would be latitude and longitude coordinates and time; a fact (sometimes called measure) would be a pixel at a given space and time as taken by the satellite. For example, in online analytical processing, an OLAP cube about a company would have dimensions that could be the company subsidiaries, the company products, and time; in this setup, a fact would be a sales event where a particular product has been sold in a particular subsidiary at a particular time. In any case, every dimension divides data into groups of cells whereas each cell in the cube represents a single measure of interest. Sometimes cubes hold only a few values with the rest being empty, i.e. undefined, while sometimes most or all cube coordinates hold a cell value. In the first case such data are called sparse, and in the second case they are called dense, although there is no hard delineation between the two. Data cubes may be stored in database management systems (DBMS) as part of array DBMS. Spatio-temporal databases and geospatial databases may also be represented as coverage data. == History == Multi-dimensional arrays have long been familiar in programming languages. Fortran offers arbitrarily-indexed 1-D arrays and arrays of arrays, which allows the construction of higher-dimensional arrays, up to 15 dimensions. APL supports n-D arrays with a rich set of operations. All these have in common that arrays must fit into the main memory and are available only while the particular program maintaining them (such as image processing software) is running. A series of data exchange formats support storage and transmission of data cube-like data, often tailored towards particular application domains. Examples include MDX for statistical (in particular, business) data, Zarr and Hierarchical Data Format for general scientific data, and TIFF for imagery. In 1992, Peter Baumann introduced management of massive data cubes with high-level user functionality combined with an efficient software architecture. Datacube operations include subset extraction, processing, fusion, and in general queries in the spirit of data manipulation languages like SQL. Some years after, the data cube concept was applied to describe time-varying business data as data cubes by Jim Gray, et al., and by Venky Harinarayan, Anand Rajaraman and Jeff Ullman. Around that time, a working group on Multi-Dimensional Databases ("Arbeitskreis Multi-Dimensionale Datenbanken") was established at German Gesellschaft für Informatik. Datacube Inc. was an image processing company selling hardware and software applications for the PC market in 1996, however without addressing data cubes as such. The EarthServer initiative has established geo data cube service requirements. == Standardization == In 2018, the ISO SQL database language was extended with data cube functionality as "SQL – Part 15: Multi-dimensional arrays (SQL/MDA)". Web Coverage Processing Service is a geo data cube analytics language issued by the Open Geospatial Consortium in 2008. In addition to the common data cube operations, the language knows about the semantics of space and time and supports both regular and irregular grid data cubes, based on the concept of coverage data. An industry standard for querying business data cubes, originally developed by Microsoft, is MultiDimensional eXpressions. == Implementation == Many high-level computer languages treat data cubes and other large arrays as single entities distinct from their contents. These languages, of which Fortran, APL, IDL, NumPy, PDL, and S-Lang are examples, allow the programmer to manipulate complete film clips and other data en masse with simple expressions derived from linear algebra and vector mathematics. Some languages (such as PDL) distinguish between a list of images and a data cube, while many (such as IDL) do not. Array DBMSs (Database Management Systems) offer a data model which generically supports definition, management, retrieval, and manipulation of n-dimensional data cubes. This database category has been pioneered by the rasdaman system since 1994. == Applications == Multi-dimensional arrays can meaningfully represent spatio-temporal sensor, image, and simulation data, but also statistics data where the semantics of dimensions is not necessarily of spatial or temporal nature. Generally, any kind of axis can be combined with any other into a data cube. === Mathematics === In mathematics, a one-dimensional array corresponds to a vector, a two-dimensional array resembles a matrix; more generally, a tensor may be represented as an n-dimensional data cube. === Science and engineering === For a time sequence of color images, the array is generally four-dimensional, with the dimensions representing image X and Y coordinates, time, and RGB (or other color space) color plane. For example, the EarthServer initiative unites data centers from different continents offering 3-D x/y/t satellite image timeseries and 4-D x/y/z/t weather data for retrieval and server-side processing through the Open Geospatial Consortium WCPS geo data cube query language standard. A data cube is also used in the field of imaging spectroscopy, since a spectrally-resolved image is represented as a three-dimensional volume. Earth observation data cubes combine satellite imagery such as Landsat 8 and Sentinel-2 with Geographic information system analytics. === Business intelligence === In online analytical processing (OLAP), data cubes are a common arrangement of business data suitable for analysis from different perspectives through operations like slicing, dicing, pivoting, and aggregation.

    Read more →
  • CPT Corporation

    CPT Corporation

    CPT Corporation was founded in 1971 by Dean Scheff in Minneapolis, Minnesota, with co-founders James Wienhold and Richard Eichhorn. CPT first designed, manufactured, and marketed the CPT 4200, a dual-cassette-tape machine that controlled a modified IBM Selectric typewriter to support text editing and word processing. The CPT 4200 was followed in 1976 by the CPT VM (Visual Memory), a partial-page display-screen dual-cassette-tape unit, and shortly thereafter by the CPT 8000, a full-page display dual-diskette desktop microcomputer that drove stand-alone daisy wheel printers. Subsequent products included (1) variants on the 8000 series; (2) the CPT 6000 series, which had a lower capacity, smaller screen, and was less expensive; (3) the CPT 9000 series, which had a larger capacity and could run IBM personal computer software; (4) the CPT Phoenix series, which had a graphical capabilities; (5) CPT PT, a software-only reduced version that ran on IBM personal computers and clones; and (6) other related products. The CPT logo—originally three letters chosen to sound well together—began to be taken as an acronym for "cassette powered typewriting," and subsequently for "computer processed text," and numerous other variants. Major competition was IBM, Wang, Lanier, Xerox, and other word processing vendors. CPT Corporation was fifth in size among Minnesota-based top high-tech companies, after 3M, Honeywell, Control Data, and Medtronic. Corporate revenues grew to approximately a quarter-billion dollars per year in the mid-1980s, then declined with the proliferation of personal computers. CPT ultimately ceased major manufacturing late in the 20th century. == Selected products == === Cassette based === The CPT 4200 was a dual-cassette-tape unit with a small built-in keyboard that controlled a modified IBM Selectric typewriter. Keystrokes entered on the typewriter appeared on the paper as they were recorded on the output cassette, which formed a magnetic replica of the characters printed on the page. That output cassette could later be used as an input cassette, where it would be played back to the typewriter along with new keystrokes to accomplish text editing. The keyboard of the CPT 4200 had action keys for "skip", "read" and "stop", mode keys for "word", "line", "paragraph," and "page." Pressing "read" transferred a word, line, paragraph, or page (depending on which mode key had been selected) from the input tape to both the typewriter and the output tape. Line boundaries (aka printer margins) recorded on the input tape were ignored or retained depending on whether or not the "adjust" key had been selected. Alternatively, pressing "skip" moved past the corresponding amount of text on the input tape without sending it to the typewriter or to the output tape. The Selectric's keyboard was active for any new typing, which would appear on the paper and transferred to the output tape. Thus a document was edited by reading back those parts of the text to be retained and skipping those parts to be discarded, with new typing added from the Selectric's keyboard. Price: approx. $5000, 1980-era values. The CPT Communicator was an add-on to the CPT 4200 that allowed data to be transferred from one text-editing machine to another, or between a text-editing machine and a remote computer, via phone lines. Price: not available. === Microprocessor based === ==== CPT 8000 series ==== The CPT 8000 was the company's first microcomputer product, exhibited in spring of 1976. It was a self-contained desktop machine with two 8-inch floppy diskette drives, a movable keyboard, and a full-page vertically oriented CRT display simulating paper with black characters on a white background, for a wysiwyg view of text on paper. It was promoted as familiar and easy to use for those experienced with typewriters. A keyboard with a large set of extra keys made operating the 8000 quite easy even for people without any computer skills or background. IN, OUT, PRINT, OOPS OOPS was changed thinking it was insulting to the buyer to assume they would ever make an error. The CPT 8000 was designed to show a full page of text with a static line showing the margin and tab stops. An additional line would display status or error messages with a times square like display. The times square error and status messages were very well done, "The printer needs a new ribbon" rather than "ERROR 034892". The text page could both smooth pan and scroll by the hardware in the display board and nothing quite like it existed for a very long time. The 8000 ran its own multitasking hardware interrupt-driven operating system but it also ran CP/M quite well. So unlike other companies that sold Wordprocessor only systems, CPT had a system that could run any of the many popular CP/M applications. Using the CP/M OS users could develop Fortran, CBasic, Cobol and other language's programs. The 8000 used Intel's 8080 microprocessor. The display board was bleeding-edge, high-speed logic. The parts available at this time were pushed to their limits to provide the speed needed to display this much text. There were times that batches of parts from one manufacturer simply could not be clocked as fast as the 8000 display required. Memory was initially 64K, but larger boards of 128K were most common then later 256K were offered. The 8080 accessed this additional RAM by running a custom page flipping circuit. The 8000 was originally priced at $8000 and its daisy wheel printer an additional $8000. The model number having been confused with the price at its first appearance at the Hanover fair. An RS-232 serial communication option was available for the 8000 series that allowed the electronic transfer of documents. One very popular use of this was to access the Westlaw system. A tempest approved version of the 8000 was developed that was RF tight with nothing being emitted that could be monitored or spied on. === Storage Systems === ==== CPT WordPak ==== The CPT WordPak series was CPT's first external document storage system that enabled multiple 8000 series workstations to store documents in an electronic filing cabinet. Prior to WordPak, all documents were stored on removable 8-inch floppy diskettes. Sharing documents involved handing off the original disk, or copying the document to a second disk and 'sneaker-net-ing' (walking it over) to the second 8000. But this resulted in two copies of the document, one at each workstation. A circuit board with a proprietary cable connector was installed in the 8000/6000 family of "workstations" and connected to the WordPak by a multi-conductor cable. WordPak 1 consisted of a single Shugart Associates SA4000 14"-diameter hard disk with a capacity of 30 megabytes. WordPak 2 added a 2nd drive for a total of 60 megabytes. ==== CPT SRS 45 ==== The CPT SRS 45 was what would now be called a server (quite likely the first of its kind) but in practice was much more. It was maybe the worlds easiest networking shared resource system. It combined a ZIP drive for backup and hard disk(s) that would be shared simultaneously by up to eight CPT machines that had the PC AT bus. The primary person responsible for its development was Bill Davidson whose wife Cheryl was responsible for bringing up CP/M, MP/M and other Digital Research products running on the Phoenix. The brilliance of the system were the networking cards that plugged into the individual machines. These used the 55AA installable driver of the IBM BIOS to simply add the zip and hard disk drives to each computers drives list. So a system that started with floppy drives A and B and a C hard disk on the machine would have the SRS 45 drives added as drives D (E, F depending on the number of hard disk) and Z for the zip drive. Sharing (avoiding writing to the same file at the same time) was handled by simply assigning parts of the drives for individuals and other directories for shared use. No "driver" software was needed. You simply plugged in the networking card and your machine had additional drives that were internal to the SRS45. This approach was far ahead of its time and sadly never recognized for its brilliance. The SRS45 as were all CPT machines not just dedicated Word Processors. === Personal-computer based === ==== CPT PT software ==== CPT PT was a reduced a version of the software that ran under MS-DOS as an application on IBM PC compatible computers. The corporation intended it as a bridge to allow data to flow in and out of personal computer packages, as well as providing a personal-computer word processing application for those familiar with standalone CPT equipment or who preferred the CPT style of dual-window text editing. Price: approx. $200, 1980-era values. ==== CPT Genius Display ==== The Genius display was a stand-alone, vertically-oriented (portrait) configuration monochrome grey-scale CRT monitor unit and an IBM PC form factor display card to allow high-resolution, full-page text & graphics on IBM PC compatible computers.

    Read more →
  • Wave Financial

    Wave Financial

    Wave is a Canadian company that provides financial services and software for small businesses. Wave is headquartered in the East Bayfront neighbourhood in Toronto, Canada. The company's first product was free online accounting software designed for businesses with 1–9 employees, followed by invoicing, personal finance and receipt-scanning software (OCR). In 2012, Wave began branching into financial services, initially with Payments by Wave (credit card processing) and Payroll by Wave, followed in February 2017 by Lending by Wave, which has since been discontinued. == History == CEO Kirk Simpson and CPO James Lochrie launched Wave Accounting Inc. in July 2009, Wave Accounting launched to the public on November 16, 2010. In June 2011, Series A funding led by OMERS Ventures was closed. In September 2011, FedDev Ontario invested one million dollars in funding. In October 2011, a $5-million investment led by U.S. venture capital firm Charles River Ventures was announced. In May 2012, Wave Accounting closed its series B financing round led by The Social+Capital Partnership, with follow-on participation from Charles River Ventures and OMERS Ventures. Wave acquired a company called Small Payroll in November 2011, which was later launched as a payroll product called Wave Payroll. In February 2012, Wave officially launched Wave Payroll to the public in Canada, followed by the American release in November of the same year. In August, 2012, the company announced the acquisition of Vuru.co, an online stock-tracking service. Terms of the deal were not disclosed. In December 2012, the company rebranded itself as Wave to emphasize its broadened spectrum of services. On March 14, 2019, the company acquired Every, a Toronto-based fintech company that provides business accounts and debit cards to small businesses. On June 11, 2019, the company announced it was being acquired by tax preparation company, H&R Block, for $537 million. On June 15, 2022, Wave announced that Kirk Simpson would be leaving and being replaced as CEO by Zahir Khoja. In May 2025, US customers of Wave were transitioned to a new Payroll processing system supported by CheckHQ. The new integration improved support for US employers by handling employer tax withholding and payments in all 50 US States. == Products == The company's initial product, Accounting by Wave, is a double entry accounting tool. Services include direct bank data imports, invoicing and expense tracking, customizable chart of accounts, and journal transactions. Accounting by Wave integrates with expense tracking software Shoeboxed and e-commerce website Etsy. The next product launched was Payroll by Wave, which was launched in 2012 after the acquisition of SmallPayroll.ca. Payroll by Wave is only available in the US and Canada. Invoicing by Wave is an offshoot of the company's earlier accounting tools. Additional products launched on or shortly after the company's rebrand in December 2012 include: a credit card processing tool, Payments by Wave, built initially on integration with Stripe credit card processing. However, Wave does not report merchant fees correctly for countries where Stripe charges a tax such as GST. In these cases, the merchant fees are reported without tax and do not match your Stripe account. a receipt scanning tool, Receipts by Wave. In 2017, Wave signed an agreement to provide its platform on RBC's online business banking site. The RBC-Wave service will be co-branded. == Taxes supported == The company's software supports tax-exclusive pricing, such as U.S. sales tax, where taxes are added on top of prices quoted. This has two effects: When scanning receipts users must manually add the tax, and input the amount. When making an invoice, users must put in a price before tax, and the system will add the tax on top. This makes Wave unable to handle taxes in countries like Australia where prices must be quoted inclusive of all taxes, such as GST. There is no way to set an invoice total and have Wave calculate the tax portion as a percentage. == Pricing and business model == As of June 10, 2024, Wave offers two tiers for its software: a free Starter plan with limitations on some features, and a paid Pro plan. In addition to its paid plan, revenue from the company comes from other paid financial services the company offers: Payments by Wave: Card processing which includes debit, credit and prepaid cards as well as ACH (bank payments) in the United States. Fees are a percentage of the transaction. Payroll by Wave: Monthly subscription fee plus usage fees. Wave previously included advertising on its pages as a source of revenue. Advertising was removed in January 2017. In 2017, Wave raised $24m (USD) in funding led by NAB Ventures. In 2019, H&R Block announced the acquisition of Wave in a cash deal worth $405 million USD.

    Read more →
  • Digital image correlation and tracking

    Digital image correlation and tracking

    Digital image correlation and tracking is an optical method that employs tracking and image registration techniques for accurate 2D and 3D measurements of changes in 2D images or 3D volumes. This method is often used to measure full-field displacement and strains, and it is widely applied in many areas of science and engineering. Compared to strain gauges and extensometers, digital image correlation methods provide finer details about deformation, due to the ability to provide both local and average data. == Overview == Digital image correlation (DIC) techniques have been increasing in popularity, especially in micro- and nano-scale mechanical testing applications due to their relative ease of implementation and use. Advances in computer technology and digital cameras have been the enabling technologies for this method and while white-light optics has been the predominant approach, DIC can be and has been extended to almost any imaging technology. The concept of using cross-correlation to measure shifts in datasets has been known for a long time, and it has been applied to digital images since at least the early 1970s. The present-day applications are almost innumerable, including image analysis, image compression, velocimetry, and strain estimation. Much early work in DIC in the field of mechanics was led by researchers at the University of South Carolina in the early 1980s and has been optimized and improved in recent years. Commonly, DIC relies on finding the maximum of the correlation array between pixel intensity array subsets on two or more corresponding images, which gives the integer translational shift between them. It is also possible to estimate shifts to a finer resolution than the resolution of the original images, which is often called "sub-pixel" registration because the measured shift is smaller than an integer pixel unit. For sub-pixel interpolation of the shift, other methods do not simply maximize the correlation coefficient. An iterative approach can also be used to maximize the interpolated correlation coefficient by using non-linear optimization techniques. The non-linear optimization approach tends to be conceptually simpler and can handle large deformations more accurately, but as with most nonlinear optimization techniques, it is slower. The two-dimensional discrete cross correlation r i j {\displaystyle r_{ij}} can be defined in several ways, one possibility being: r i j = ∑ m ∑ n [ f ( m + i , n + j ) − f ¯ ] [ g ( m , n ) − g ¯ ] ∑ m ∑ n [ f ( m , n ) − f ¯ ] 2 ∑ m ∑ n [ g ( m , n ) − g ¯ ] 2 . {\displaystyle r_{ij}={\frac {\sum _{m}\sum _{n}[f(m+i,n+j)-{\bar {f}}][g(m,n)-{\bar {g}}]}{\sqrt {\sum _{m}\sum _{n}{[f(m,n)-{\bar {f}}]^{2}}\sum _{m}\sum _{n}{[g(m,n)-{\bar {g}}]^{2}}}}}.} Here f(m, n) is the pixel intensity or the gray-scale value at a point (m, n) in the original image, g(m, n) is the gray-scale value at a point (m, n) in the translated image, f ¯ {\displaystyle {\bar {f}}} and g ¯ {\displaystyle {\bar {g}}} are mean values of the intensity matrices f and g respectively. However, in practical applications, the correlation array is usually computed using Fourier-transform methods, since the fast Fourier transform is a much faster method than directly computing the correlation. F = F { f } , G = F { g } . {\displaystyle \mathbf {F} ={\mathcal {F}}\{f\},\quad \mathbf {G} ={\mathcal {F}}\{g\}.} Then taking the complex conjugate of the second result and multiplying the Fourier transforms together elementwise, we obtain the Fourier transform of the correlogram, R {\displaystyle \ R} : R = F ∘ G ∗ , {\displaystyle R=\mathbf {F} \circ \mathbf {G} ^{},} where ∘ {\displaystyle \circ } is the Hadamard product (entry-wise product). It is also fairly common to normalize the magnitudes to unity at this point, which results in a variation called phase correlation. Then the cross-correlation is obtained by applying the inverse Fourier transform: r = F − 1 { R } . {\displaystyle \ r={\mathcal {F}}^{-1}\{R\}.} At this point, the coordinates of the maximum of r i j {\displaystyle r_{ij}} give the integer shift: ( Δ x , Δ y ) = arg ⁡ max ( i , j ) { r } . {\displaystyle (\Delta x,\Delta y)=\arg \max _{(i,j)}\{r\}.} == Deformation mapping == For deformation mapping, the mapping function that relates the images can be derived from comparing a set of subwindow pairs over the whole images. (Figure 1). The coordinates or grid points (xi, yj) and (xi, yj) are related by the translations that occur between the two images. If the deformation is small and perpendicular to the optical axis of the camera, then the relation between (xi, yj) and (xi, yj) can be approximated by a 2D affine transformation such as: x ∗ = x + u + ∂ u ∂ x Δ x + ∂ u ∂ y Δ y , {\displaystyle x^{}=x+u+{\frac {\partial u}{\partial x}}\Delta x+{\frac {\partial u}{\partial y}}\Delta y,} y ∗ = y + v + ∂ v ∂ x Δ x + ∂ v ∂ y Δ y . {\displaystyle y^{}=y+v+{\frac {\partial v}{\partial x}}\Delta x+{\frac {\partial v}{\partial y}}\Delta y.} Here u and v are translations of the center of the sub-image in the X and Y directions respectively. The distances from the center of the sub-image to the point (x, y) are denoted by Δ x {\displaystyle \Delta x} and Δ y {\displaystyle \Delta y} . Thus, the correlation coefficient rij is a function of displacement components (u, v) and displacement gradients ∂ u ∂ x , ∂ u ∂ y , ∂ v ∂ x , ∂ v ∂ y . {\displaystyle {\frac {\partial u}{\partial x}},{\frac {\partial u}{\partial y}},{\frac {\partial v}{\partial x}},{\frac {\partial v}{\partial y}}.} DIC has proven to be very effective at mapping deformation in macroscopic mechanical testing, where the application of specular markers (e.g. paint, toner powder) or surface finishes from machining and polishing provide the needed contrast to correlate images well. However, these methods for applying surface contrast do not extend to the application of free-standing thin films for several reasons. First, vapor deposition at normal temperatures on semiconductor grade substrates results in mirror-finish quality films with RMS roughnesses that are typically on the order of several nanometers. No subsequent polishing or finishing steps are required, and unless electron imaging techniques are employed that can resolve microstructural features, the films do not possess enough useful surface contrast to adequately correlate images. Typically this challenge can be circumvented by applying paint that results in a random speckle pattern on the surface, although the large and turbulent forces resulting from either spraying or applying paint to the surface of a free-standing thin film are too high and would break the specimens. In addition, the sizes of individual paint particles are on the order of μms, while the film thickness is only several hundred nanometers, which would be analogous to supporting a large boulder on a thin sheet of paper. == Digital volume correlation == Digital Volume Correlation (DVC, and sometimes called Volumetric-DIC) extends the 2D-DIC algorithms into three dimensions to calculate the full-field 3D deformation from a pair of 3D images. This technique is distinct from 3D-DIC, which only calculates the 3D deformation of an exterior surface using conventional optical images. The DVC algorithm is able to track full-field displacement information in the form of voxels instead of pixels. The theory is similar to above except that another dimension is added: the z-dimension. The displacement is calculated from the correlation of 3D subsets of the reference and deformed volumetric images, which is analogous to the correlation of 2D subsets described above. DVC can be performed using volumetric image datasets. These images can be obtained using confocal microscopy, X-ray computed tomography, Magnetic Resonance Imaging or other techniques. Similar to the other DIC techniques, the images must exhibit a distinct, high-contrast 3D "speckle pattern" to ensure accurate displacement measurement. DVC was first developed in 1999 to study the deformation of trabecular bone using X-ray computed tomography images. Since then, applications of DVC have grown to include granular materials, metals, foams, composites and biological materials. To date it has been used with images acquired by MRI imaging, Computer Tomography (CT), micro-CT, confocal microscopy, and lightsheet microscopy. DVC is currently considered to be ideal in the research world for 3D quantification of local displacements, strains, and stress in biological specimens. It is preferred because of the non-invasiveness of the method over traditional experimental methods. Two of the key challenges are improving the speed and reliability of the DVC measurement. The 3D imaging techniques produce noisier images than conventional 2D optical images, which reduces the quality of the displacement measurement. Computational speed is restricted by the file sizes of 3D images, which are significantly larger than 2D images. For example, an

    Read more →
  • Viewport

    Viewport

    A viewport is a polygon viewing region in computer graphics. In computer graphics theory, there are two region-like notions of relevance when rendering some objects to an image. In textbook terminology, the world coordinate window is the area of interest (meaning what the user wants to visualize) in some application-specific coordinates, e.g. miles, centimeters etc. The word window as used here should not be confused with the GUI window, i.e. the notion used in window managers. Rather it is an analogy with how a window limits what one can see outside a room. In contrast, the viewport is an area (typically rectangular) expressed in rendering-device-specific coordinates, e.g. pixels for screen coordinates, in which the objects of interest are going to be rendered. Clipping to the world-coordinates window is usually applied to the objects before they are passed through the window-to-viewport transformation. For a 2D object, the latter transformation is simply a combination of translation and scaling, the latter not necessarily uniform. An analogy of this transformation process based on traditional photography notions is to equate the world-clipping window with the camera settings and the variously sized prints that can be obtained from the resulting film image as possible viewports. Because the physical-device-based coordinates may not be portable from one device to another, a software abstraction layer known as normalized device coordinates is typically introduced for expressing viewports; it appears for example in the Graphical Kernel System (GKS) and later systems inspired from it. In 3D computer graphics, the viewport refers to the 2D rectangle used to project the 3D scene to the position of a virtual camera. A viewport is a region of the screen used to display a portion of the total image to be shown. In virtual desktops, the viewport is the visible portion of a 2D area which is larger than the visualization device. When viewing a document in a web browser, the viewport is the region of the browser window which contains the visible portion of the document. If the size of the viewport changes, for example as a result of the user resizing the browser window, then the browser may reflow the document (recalculate the locations and sizes of elements of the document). If the document is larger than the viewport, the user can control the portion of the document which is visible by scrolling in the viewport.

    Read more →
  • SciPy

    SciPy

    SciPy (pronounced "sigh pie") is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, fast Fourier transform, signal and image processing, ordinary differential equation solvers and other tasks common in science and engineering. SciPy is also a family of conferences for users and developers of these tools: SciPy (in the United States), EuroSciPy (in Europe) and SciPy.in (in India). Enthought originated the SciPy conference in the United States and continues to sponsor many of the international conferences as well as host the SciPy website. The SciPy library is currently distributed under the BSD license, and its development is sponsored and supported by an open community of developers. It is also supported by NumFOCUS, a community foundation for supporting reproducible and accessible science. == Components == The SciPy package is at the core of Python's scientific computing capabilities. Available sub-packages include: cluster: hierarchical clustering, vector quantization, K-means constants: physical constants and conversion factors datasets: various example datasets for demonstrating image and data processing differentiate: numerical differentiation for first and second derivatives fft: Discrete Fourier Transform algorithms fftpack: Legacy interface for Discrete Fourier Transforms integrate: numerical integration routines interpolate: interpolation tools io: data input and output, including support for MATLAB and Matrix Market files linalg: linear algebra routines ndimage: various functions for multi-dimensional image processing odr: orthogonal distance regression classes and algorithms optimize: optimization algorithms including linear programming and a variety of numerical nonlinear programming optimizers signal: signal processing tools sparse: sparse matrices and related algorithms spatial: algorithms for spatial structures such as k-d trees, nearest neighbors, convex hulls, etc. special: special functions stats: statistical functions == Data structures == The basic data structure used by SciPy is a multidimensional array provided by the NumPy module. NumPy provides some functions for linear algebra, Fourier transforms, and random number generation, but not with the generality of the equivalent functions in SciPy. NumPy can also be used as an efficient multidimensional container of data with arbitrary datatypes. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. Older versions of SciPy used Numeric as an array type, which is now deprecated in favor of the newer NumPy array code. == History == In the 1990s, Python was extended to include an array type for numerical computing called Numeric. (This package was eventually replaced by NumPy, which was written by Travis Oliphant in 2006 as a blending of Numeric and Numarray, with Numarray itself being started in 2001.) As of 2000, there was a growing number of extension modules and increasing interest in creating a complete environment for scientific and technical computing. In 2001, Travis Oliphant, Eric Jones, and Pearu Peterson merged code they had written and called the resulting package SciPy. The newly created package provided a standard collection of common numerical operations on top of the Numeric array data structure. Shortly thereafter, Fernando Pérez released IPython, an enhanced interactive shell widely used in the technical computing community, and John Hunter released the first version of Matplotlib, the 2D plotting library for technical computing. Since then the SciPy environment has continued to grow with more packages and tools for technical computing. == Scientific Python versus ScientificPython == In the scientific literature, SciPy is occasionally referred to as "Scientific Python (SciPy)". This is incorrect: the official name of the project is just "SciPy". Furthermore, expanding "SciPy" as "Scientific Python" may cause confusion with "ScientificPython", a project led by Konrad Hinsen of Orléans University that was active between 1995 and 2014. "Scientific Python" is also used for the related ecosystem of tools.

    Read more →
  • LakeFS

    LakeFS

    lakeFS is an open-source data version control system for managing data stored in object storage. It provides Git-like operations such as branching, committing, merging, and reverting for large-scale data stored in systems including Amazon S3, Azure Blob Storage, and Google Cloud Storage, as well as other S3-compatible object storage platforms. lakeFS is used in data engineering and machine learning workflows to manage changes to data, support reproducibility, and enable data governance across data lakes. The software is available as an open-source project, as well as in enterprise and managed service offerings, including lakeFS Cloud. == History == lakeFS was created in 2020 by Einat Orr and Oz Katz at Treeverse. Its first public release, version 0.8.1, appeared in August 2020 and introduced Git-style operations with support for Amazon S3. In 2021, Treeverse raised $23 million in a Series A funding round led by Dell Technologies Capital, Norwest Venture Partners, and Zeev Ventures. The same year, lakeFS was included in InfoWorld’s Best of Open Source Software (Bossie) awards. In June 2022, Treeverse introduced lakeFS Cloud, a managed service providing hosted lakeFS deployments for cloud-based data lakes. Version 1.0 was released in October 2023, adding integrations with platforms such as Databricks and Apache Iceberg, as well as support for orchestration tools including Apache Airflow. Public case studies and conference materials have described usage of lakeFS by organizations such as Microsoft, Volvo, and NASA. In July 2025, Treeverse announced an additional $20 million in growth funding to support further development of lakeFS. In November 2025, Treeverse announced the acquisition of the open-source data version control project DVC. == Software == === Overview === lakeFS provides Git-like operations such as branching, committing, merging, and reverting for datasets stored in object storage. These operations are used to manage changes to data, test modifications in isolation, reproduce specific data states, and recover from errors or unintended updates. === Architecture === lakeFS operates as a metadata layer on top of object storage systems such as Amazon S3, Azure Blob Storage, and Google Cloud Storage. It stores repository metadata describing commits, branches, and tags, enabling versioned views of data without copying underlying objects. The system provides access through multiple interfaces, including a web user interface, command-line tools, a REST API, and software development kits. It is designed to integrate with existing data engineering and machine learning workflows, and can be deployed either in self-hosted environments or as a managed service. === Functions === lakeFS provides version control functionality for data stored in object storage–based data lakes. Core features include: Atomic commits and version tracking for datasets, supporting reproducibility and auditability. Branching and merging mechanisms that allow isolated development and testing without duplicating data. Configurable hooks that can validate data or trigger external processes during commit and merge operations. The ability to revert repositories to earlier states to recover from data errors or failed changes. Recording of commit history and associated metadata for lineage tracking. Support for managing data across multiple object storage systems, including Amazon S3, Azure Blob Storage, Google Cloud Storage, and MinIO. Use of fixed data versions to reproduce experiments and machine learning model training. === Integrations === Coverage of lakeFS has described integrations with platforms such as Databricks and Apache Iceberg, as well as support for environments including Red Hat OpenShift. Additional materials describe its use with Trino, including validation of data changes prior to merging in versioned data workflows, as well as compatibility with orchestration tools such as Apache Airflow.

    Read more →