AI Data Farms

AI Data Farms — independent reviews, comparisons, pricing and step-by-step guides on Aizhi.

  • Cross-validation (statistics)

    Cross-validation (statistics)

    Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation includes resampling and sample splitting methods that use different portions of the data to test and train a model on different iterations. It is often used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It can also be used to assess the quality of a fitted model and the stability of its parameters. In a prediction problem, a model is usually given a dataset of known data on which training is run (training dataset), and a dataset of unknown data (or first seen data) against which the model is tested (called the validation dataset or testing set). The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset (i.e., an unknown dataset, for instance from a real problem). One round of cross-validation involves partitioning a sample of data into complementary subsets, performing the analysis on one subset (called the training set), and validating the analysis on the other subset (called the validation set or testing set). To reduce variability, in most methods multiple rounds of cross-validation are performed using different partitions, and the validation results are combined (e.g. averaged) over the rounds to give an estimate of the model's predictive performance. In summary, cross-validation combines (averages) measures of fitness in prediction to derive a more accurate estimate of model prediction performance. == Motivation == Assume a model with one or more unknown parameters, and a data set to which the model can be fit (the training data set). The fitting process optimizes the model parameters to make the model fit the training data as well as possible. If an independent sample of validation data is taken from the same population as the training data, it will generally turn out that the model does not fit the validation data as well as it fits the training data. The size of this difference is likely to be large especially when the size of the training data set is small, or when the number of parameters in the model is large. Cross-validation is a way to estimate the size of this effect. === Example: linear regression === In linear regression, there exist real response values y 1 , … , y n {\textstyle y_{1},\ldots ,y_{n}} , and n p-dimensional vector covariates x1, ..., xn. The components of the vector xi are denoted xi1, ..., xip. If least squares is used to fit a function in the form of a hyperplane ŷ = a + βTx to the data (xi, yi) 1 ≤ i ≤ n, then the fit can be assessed using the mean squared error (MSE). The MSE for given estimated parameter values a and β on the training set (xi, yi) 1 ≤ i ≤ n is defined as: MSE = 1 n ∑ i = 1 n ( y i − y ^ i ) 2 = 1 n ∑ i = 1 n ( y i − a − β T x i ) 2 = 1 n ∑ i = 1 n ( y i − a − β 1 x i 1 − ⋯ − β p x i p ) 2 {\displaystyle {\begin{aligned}{\text{MSE}}&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-{\hat {y}}_{i})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-{\boldsymbol {\beta }}^{T}\mathbf {x} _{i})^{2}\\&={\frac {1}{n}}\sum _{i=1}^{n}(y_{i}-a-\beta _{1}x_{i1}-\dots -\beta _{p}x_{ip})^{2}\end{aligned}}} If the model is correctly specified, it can be shown under mild assumptions that the expected value of the MSE for the training set is (n − p − 1)/(n + p + 1) < 1 times the expected value of the MSE for the validation set (the expected value is taken over the distribution of training sets). Thus, a fitted model and computed MSE on the training set will result in an optimistically biased assessment of how well the model will fit an independent data set. This biased estimate is called the in-sample estimate of the fit, whereas the cross-validation estimate is an out-of-sample estimate. Since in linear regression it is possible to directly compute the factor (n − p − 1)/(n + p + 1) by which the training MSE underestimates the validation MSE under the assumption that the model specification is valid, cross-validation can be used for checking whether the model has been overfitted, in which case the MSE in the validation set will substantially exceed its anticipated value. (Cross-validation in the context of linear regression is also useful in that it can be used to select an optimally regularized cost function.) === General case === In most other regression procedures (e.g. logistic regression), there is no simple formula to compute the expected out-of-sample fit. Cross-validation is, thus, a generally applicable way to predict the performance of a model on unavailable data using numerical computation in place of theoretical analysis. == Types == Two types of cross-validation can be distinguished: exhaustive and non-exhaustive cross-validation. === Exhaustive cross-validation === Exhaustive cross-validation methods are cross-validation methods which learn and test on all possible ways to divide the original sample into a training and a validation set. ==== Leave-p-out cross-validation ==== Leave-p-out cross-validation (LpO CV) involves using p observations as the validation set and the remaining observations as the training set. This is repeated on all ways to cut the original sample on a validation set of p observations and a training set. LpO cross-validation require training and validating the model C p n {\displaystyle C_{p}^{n}} times, where n is the number of observations in the original sample, and where C p n {\displaystyle C_{p}^{n}} is the binomial coefficient. For p > 1 and for even moderately large n, LpO CV can become computationally infeasible. For example, with n = 100 and p = 30, C 30 100 ≈ 3 × 10 25 . {\displaystyle C_{30}^{100}\approx 3\times 10^{25}.} A variant of LpO cross-validation with p=2 known as leave-pair-out cross-validation has been recommended as a nearly unbiased method for estimating the area under ROC curve of binary classifiers. ==== Leave-one-out cross-validation ==== Leave-one-out cross-validation (LOOCV) is a particular case of leave-p-out cross-validation with p = 1. The process looks similar to jackknife; however, with cross-validation one computes a statistic on the left-out sample(s), while with jackknifing one computes a statistic from the kept samples only. LOO cross-validation requires less computation time than LpO cross-validation because there are only C 1 n = n {\displaystyle C_{1}^{n}=n} passes rather than C p n {\displaystyle C_{p}^{n}} . However, n {\displaystyle n} passes may still require quite a large computation time, in which case other approaches such as k-fold cross validation may be more appropriate. Pseudo-code algorithm: Input: x, {vector of length N with x-values of incoming points} y, {vector of length N with y-values of the expected result} interpolate( x_in, y_in, x_out ), { returns the estimation for point x_out after the model is trained with x_in-y_in pairs} Output: err, {estimate for the prediction error} Steps: err ← 0 for i ← 1, ..., N do // define the cross-validation subsets x_in ← (x[1], ..., x[i − 1], x[i + 1], ..., x[N]) y_in ← (y[1], ..., y[i − 1], y[i + 1], ..., y[N]) x_out ← x[i] y_out ← interpolate(x_in, y_in, x_out) err ← err + (y[i] − y_out)^2 end for err ← err/N === Non-exhaustive cross-validation === Non-exhaustive cross validation methods do not compute all ways of splitting the original sample. These methods are approximations of leave-p-out cross-validation. ==== k-fold cross-validation ==== In k-fold cross-validation, the original sample is randomly partitioned into k equal sized subsamples, often referred to as "folds". Of the k subsamples, a single subsample is retained as the validation data for testing the model, and the remaining k − 1 subsamples are used as training data. The cross-validation process is then repeated k times, with each of the k subsamples used exactly once as the validation data. The k results can then be averaged to produce a single estimation. The advantage of this method over repeated random sub-sampling (see below) is that all observations are used for both training and validation, and each observation is used for validation exactly once. 10-fold cross-validation is commonly used, but in general k remains an unfixed parameter. For example, setting k = 2 results in 2-fold cross-validation. In 2-fold cross-validation, the dataset is randomly shuffled into two sets d0 and d1, so that both sets are equal size (this is usually implemented by shuffling the data array and then splitting it in two). We then train on d0 and validate on d1, followed by training on d1 and validating on d0. When k = n (the number of observations), k-fold cross-validation is equivalent to leave-one-out cr

    Read more →
  • Yorba (software)

    Yorba (software)

    Yorba is a web-based personal information management platform for finding, monitoring, or deleting online accounts and subscriptions. Yorba is a participating member of Consumer Reports’ Data Rights Protocol (DRP) consortium that develops open technical standards for exercising consumer data rights under laws including the California Consumer Privacy Act. == History == Yorba began as a research project around 2021. It was founded by Chris Zeunstrom (CEO), Nolan Cabeje (CDO) and David Schmudde (CTO). Zeunstrom says he began developing Yorba after growing frustrated with managing numerous email accounts, noting overloaded inboxes create distraction and potential security vulnerabilities. Yorba’s early development was also influenced by security issues he encountered at a previous company, which had been affected by data breaches at a time when such incidents were becoming increasingly common. In 2023, Yorba launched a private beta as a public benefit corporation funded through a give-back model operated by Zeunstrom's New York-based design firm, Ruca. In January 2024, Yorba entered public beta and reported over 1,000 users, including 160 premium subscribers. At the time of the public beta launch, Yorba integrated with Gmail and announced plans to expand compatibility to other online services and cloud storage providers. In September 2024, Yorba completed conformance testing under the Data Rights Protocol, an initiative developed by Consumer Reports, to establish a standard and open-source framework for securely transmitting consumer data rights requests under laws like the California Consumer Privacy Act. Yorba was named among twelve participating companies that implemented the protocol alongside OneTrust and Consumer Reports’ own Permission Slip app. Yorba was one of nine startups selected as 2025 finalist in the Santander X Global Awards international entrepreneurship competition. == Features == Yorba scans user inbox history data to identify online accounts, mailing lists, and possible data breaches. It uses natural language processing and machine learning to identify a user's accounts, services, and subscriptions. The platform prompts password resets for compromised accounts and locates unused accounts. The platform also supports mailing list management by identifying and helping users unsubscribe from newsletters. Paid subscribers can locate and cancel recurring charges. Yorba links with financial institutions in the U.S., Canada, and EU via Plaid Inc. to detect recurring charges and delete unwanted subscriptions. == Privacy and Ethics == Yorba's founder has openly criticized dark patterns that make canceling services difficult, citing personal frustration with inbox clutter as part of his inspiration for Yorba. Yorba offers privacy policy analysis in partnership with Amsterdam-based nonprofit Terms of Service; Didn’t Read, assigning grades based on invasiveness or ethical concerns. As of 2024, the company described its pricing as designed to cover operational costs and sustain the platform without outside investment.

    Read more →
  • Seed (programming)

    Seed (programming)

    Seed is a JavaScript interpreter and a library of the GNOME project to create standalone applications in JavaScript. It uses the JavaScript engine JavaScriptCore of the WebKit project. It is possible to easily create modules in C. Seed is integrated in GNOME since the 2.28 version and is used by two games in the GNOME Games package. It is also used by the Web web browser for the design of its extensions. The module is also officially supported by the GTK+ project. == Hello world in Seed == This example uses the standard output to output the string "Hello, World". == A program using GTK+ == This code shows an empty window named "Example". == Modules == To use a module, just instantiate a class having for name imports. followed by the name of the module respecting the case sensitivity. The modules using GObject Introspection, who starts by imports.gi. : Gtk Gst GObject Gio Clutter GLib Gdk WebKit GdkPixbuf, GdkPixbuf Libxml Cairo DBus MPFR Os (system library) Canvas (using Cairo) multiprocessing readline Archived 2009-11-09 at the Wayback Machine ffi sqlite sandbox Archived 2009-11-09 at the Wayback Machine == List of the Seed versions == The names of the versions of Seed are albums of famous rock bands.

    Read more →
  • ShareMethods

    ShareMethods

    ShareMethods is a Web 2.0 document management and collaboration service with a focus on sales, marketing, and the extended selling network. It offers a software as a service (SaaS) subscription to companies and is available as a stand-alone application or as an integrated program with CRM tools such as Oracle CRM On Demand or salesforce.com. == History == ShareMethods was launched in 2004 to provide collaboration and communication services for sales and marketing teams, business partners, and customers. The founders have a background of building software-as-a-service applications and creating digital media applications. In September 2005, ShareMethods launched "ShareNow" as one of the first applications on the salesforce.com AppExchange. In September 2006, ShareMethods moved its operations into a SAS 70 Type II data center owned by SunGard. In March 2009, ShareMethods launched "ShareSpaces" to provide on-demand portals or workspaces. In 2013, ShareMethods announced that its platform is available in a private cloud (on-premises) version. == Products == ShareMethods: Combines document management, collaboration, analytics, and CRM integration into a single solution. Key content can be centrally managed and delivered to sales channels, while providing feedback to marketing. ShareMethods is often used as a sales portal for internal sales and a partner portal for external partners. ShareNow: Integrates ShareMethods with salesforce.com providing Single Sign On for salesforce.com users and access to files related to accounts opportunities, etc. including custom objects. Also facilitates collaboration between salesforce.com users and non-users. ShareMethods for Oracle CRM On Demand: Integrates ShareMethods with Oracle CRM On Demand providing Single Sign On for Oracle users and easy access to files related to accounts opportunities, etc. ShareOffice: An on-demand intranet/extranet solution. Features include full-text search, version history, server sync-up, email updates, audit trail/analytics, check-in/check-out, multilingual user interface. ShareSpaces: Independent workspaces or portals where users can collaborate with business partners, teammates, or individuals to work together on content and documents. == Integration and interoperability == ShareMethods is available on Salesforce.com's AppExchange platform. ShareMethods also integrates with Oracle CRM On Demand to provide document management within the CRM application. Customers also can integrate proprietary systems via single-sign-on and self-registration. In addition, developers can make use of the ShareMethods API based on WebDAV to integrate document management functionality.

    Read more →
  • Cloud manufacturing

    Cloud manufacturing

    Cloud manufacturing (CMfg) is a new manufacturing paradigm developed from existing advanced manufacturing models (e.g., ASP, AM, NM, MGrid) and enterprise information technologies under the support of cloud computing, Internet of Things (IoT), virtualization and service-oriented technologies, and advanced computing technologies. It transforms manufacturing resources and manufacturing capabilities into manufacturing services, which can be managed and operated in an intelligent and unified way to enable the full sharing and circulating of manufacturing resources and manufacturing capabilities. CMfg can provide safe and reliable, high quality, cheap and on-demand manufacturing services for the whole lifecycle of manufacturing. The concept of manufacturing here refers to big manufacturing that includes the whole lifecycle of a product (e.g. design, simulation, production, test, maintenance). The concept of Cloud manufacturing was initially proposed by the research group led by Prof. Bo Hu Li and Prof. Lin Zhang in China in 2010. Related discussions and research were conducted hereafter, and some similar definitions (e.g. Cloud-Based Design and Manufacturing (CBDM). ) to cloud manufacturing were introduced. Cloud manufacturing is a type of parallel, networked, and distributed system consisting of an integrated and inter-connected virtualized service pool (manufacturing cloud) of manufacturing resources and capabilities as well as capabilities of intelligent management and on-demand use of services to provide solutions for all kinds of users involved in the whole lifecycle of manufacturing. == Types == Cloud Manufacturing can be divided into two categories. The first category concerns deploying manufacturing software on the Cloud, i.e. a “manufacturing version” of Computing. CAx software can be supplied as a service on the Manufacturing Cloud (MCloud). The second category has a broader scope, cutting across production, management, design and engineering abilities in a manufacturing business. Unlike with computing and data storage, manufacturing involves physical equipment, monitors, materials and so on. In this kind of Cloud Manufacturing system, both material and non-material facilities are implemented on the Manufacturing Cloud to support the whole supply chain. Costly resources are shared on the network. This means that the utilisation rate of rarely used equipment rises and the cost of expensive equipment is reduced. According to the concept of Cloud technology, there will not be direct interaction between Cloud Users and Service Providers. The Cloud User should neither manage nor control the infrastructure and manufacturing applications. As a matter of fact, the former can be considered part of the latter. In CMfg system, various manufacturing resources and abilities can be intelligently sensed and connected into wider Internet, and automatically managed and controlled using IoT technologies (e.g., RFID, wired and wireless sensor network, embedded system). Then the manufacturing resources and abilities are virtualized and encapsulated into different manufacturing cloud services (MCSs), that can be accessed, invoked, and deployed based on knowledge by using virtualization technologies, service-oriented technologies, and cloud computing technologies. The MCSs are classified and aggregated according to specific rules and algorithms, and different kinds of manufacturing clouds are constructed. Different users can search and invoke the qualified MCSs from related manufacturing cloud according to their needs, and assemble them to be a virtual manufacturing environment or solution to complete their manufacturing task involved in the whole life cycle of manufacturing processes under the support of cloud computing, service-oriented technologies, and advanced computing technologies. Four types of cloud deployment modes (public, private, community and hybrid clouds) are ubiquitous as a single point of access. Private cloud refers to a centralized management effort in which manufacturing services are shared within one company or its subsidiaries. Enterprises' mission-critical and core-business applications are often kept in a private cloud. Community cloud is a collaborative effort in which manufacturing services are shared between several organizations from a specific community with common concerns. Public cloud realizes the key concept of sharing services with the general public in a multi-tenant environment. Hybrid cloud is a composition of two or more clouds (private, community or public) that remain distinct entities but are also bound together, offering the benefits of multiple deployment modes. == Resources == From the resource’s perspective, each kind of manufacturing capability requires support from the related manufacturing resource. For each type of manufacturing capability, its related manufacturing resource comes in two forms, soft resources and hard resources. === Soft resources === Software: software applications throughout the product lifecycle including design, analysis, simulation, process planning, and are only beginning to be embraced by the electronics manufacturing industry. Knowledge: experience and know-how needed to complete a production task, i.e. engineering knowledge, product models, standards, evaluation procedures and results, customer feedback, and manufacturing in the cloud provides just as many solutions as the number of questions it also raises for manufacturing executives wanting to make the best possible decision. Skill: expertise in performing a specific manufacturing task. Personnel: human resource engaged in the manufacturing process, i.e. designers, operators, managers, technicians, project teams, customer service, etc. Experience: performance, quality, client evaluation, etc. Business Network: business relationships and business opportunity networks that exist in an enterprise. === Hard resources === Manufacturing Equipment: facilities needed for completing a manufacturing task, e.g. machine tools, cutters, test and monitoring equipment and other fabrication tools. Monitoring/Control Resource: devices used to identify and control other manufacturing resource, for instance, RFID (Radio-Frequency IDentification), WSN (Wireless Sensor Network), virtual managers and remote controllers. Computational Resource: computing devices to support production process, e.g. servers, computers, storage media, control devices, etc. Materials: inputs and outputs in a production system, e.g. raw material, product-in-progress, finished product, power, water, lubricants, etc. Storage: automated storage and retrieval systems, logic controllers, location of warehouses, volume capacity and schedule/optimization methods. Transportation: movement of manufacturing inputs/outputs from one location to another. It includes the modes of transport, e.g. air, rail, road, water, cable, pipeline and space, and the related price, and time taken.

    Read more →
  • Bring your own encryption

    Bring your own encryption

    Bring your own encryption (BYOE), also known as bring your own key (BYOK), is a cloud computing security model that allows cloud service customers to use their own encryption software and manage their own encryption keys. == Overview == BYOE enables cloud service customers to utilize a virtual instance of their encryption software alongside their cloud-hosted business applications to encrypt their data. In this model, hosted business applications are configured to process all data through the encryption software. This software then writes the ciphertext version of the data to the cloud service provider's physical data store and decrypts ciphertext data upon retrieval requests. This approach provides enterprises with control over their keys and the ability to generate their own master key using internal hardware security modules (HSM), which are then transmitted to the cloud provider's HSM. When the data is no longer needed, such as when users discontinue the cloud service, the keys can be deleted, rendering the encrypted data permanently inaccessible. This practice is known as crypto-shredding. == Potential Advantages == Organizations can store data with unique encryption that only they can access. Multiple organizations can share the same hardware infrastructure via cloud services like Amazon Web Services (AWS) or Google Cloud while maintaining encryption to comply with regulations such as HIPAA. == Potential Challenges == Resource utilization may be higher compared to traditional encryption practices when multiple users share the same hardware and use their own encryption. Efforts to minimize resource utilization issues may potentially impact security benefits.

    Read more →
  • ShareMethods

    ShareMethods

    ShareMethods is a Web 2.0 document management and collaboration service with a focus on sales, marketing, and the extended selling network. It offers a software as a service (SaaS) subscription to companies and is available as a stand-alone application or as an integrated program with CRM tools such as Oracle CRM On Demand or salesforce.com. == History == ShareMethods was launched in 2004 to provide collaboration and communication services for sales and marketing teams, business partners, and customers. The founders have a background of building software-as-a-service applications and creating digital media applications. In September 2005, ShareMethods launched "ShareNow" as one of the first applications on the salesforce.com AppExchange. In September 2006, ShareMethods moved its operations into a SAS 70 Type II data center owned by SunGard. In March 2009, ShareMethods launched "ShareSpaces" to provide on-demand portals or workspaces. In 2013, ShareMethods announced that its platform is available in a private cloud (on-premises) version. == Products == ShareMethods: Combines document management, collaboration, analytics, and CRM integration into a single solution. Key content can be centrally managed and delivered to sales channels, while providing feedback to marketing. ShareMethods is often used as a sales portal for internal sales and a partner portal for external partners. ShareNow: Integrates ShareMethods with salesforce.com providing Single Sign On for salesforce.com users and access to files related to accounts opportunities, etc. including custom objects. Also facilitates collaboration between salesforce.com users and non-users. ShareMethods for Oracle CRM On Demand: Integrates ShareMethods with Oracle CRM On Demand providing Single Sign On for Oracle users and easy access to files related to accounts opportunities, etc. ShareOffice: An on-demand intranet/extranet solution. Features include full-text search, version history, server sync-up, email updates, audit trail/analytics, check-in/check-out, multilingual user interface. ShareSpaces: Independent workspaces or portals where users can collaborate with business partners, teammates, or individuals to work together on content and documents. == Integration and interoperability == ShareMethods is available on Salesforce.com's AppExchange platform. ShareMethods also integrates with Oracle CRM On Demand to provide document management within the CRM application. Customers also can integrate proprietary systems via single-sign-on and self-registration. In addition, developers can make use of the ShareMethods API based on WebDAV to integrate document management functionality.

    Read more →
  • Django (web framework)

    Django (web framework)

    Django ( JANG-goh; sometimes stylized as django) is a free and open-source, Python-based web framework that runs on a web server. It follows the model–template–views (MTV) architectural pattern. It is maintained by the Django Software Foundation (DSF), an independent organization established in the US as a 501(c)(3) non-profit. Django's primary goal is to ease the creation of complex, database-driven websites. The framework emphasizes reusability and "pluggability" of components, less code, low coupling, rapid development, and the principle of don't repeat yourself. Python is used throughout, even for settings, files, and data models. Django also provides an optional administrative create, read, update and delete interface that is generated dynamically through introspection and configured via admin models. Some well-known sites that use Django include Instagram, Mozilla, Disqus, Bitbucket, Nextdoor, and Clubhouse. == History == Django was created in the autumn of 2003, when the web programmers at the Lawrence Journal-World newspaper, Adrian Holovaty and Simon Willison, began using Python to build applications. Jacob Kaplan-Moss was hired early in Django's development shortly before Willison's internship ended. It was released publicly under a BSD license in July 2005. The framework was named after guitarist Django Reinhardt. Holovaty is a romani jazz guitar player inspired in part by Reinhardt's music. In June 2008, it was announced that a newly formed Django Software Foundation (DSF) would maintain Django in the future. == Features == === Components === Despite having its own nomenclature, such as naming the callable objects generating the HTTP responses "views", the core Django framework can be seen as an MVC architecture. It consists of an object-relational mapper (ORM) that mediates between data models (defined as Python classes) and a relational database ("Model"), a system for processing HTTP requests with a web templating system ("View"), and a regular-expression-based URL dispatcher ("Controller"). Also included in the core framework are: a lightweight and standalone web server for development and testing a form serialization and validation system that can translate between HTML forms and values suitable for storage in the database a template system that utilizes the concept of inheritance borrowed from object-oriented programming a caching framework that can use any of several cache methods support for middleware classes that can intervene at various stages of request processing and carry out custom functions an internal dispatcher system that allows components of an application to communicate events to each other via pre-defined signals an internationalization system, including translations of Django's own components into a variety of languages a serialization system that can produce and read XML and/or JSON representations of Django model instances a system for extending the capabilities of the template engine an interface to Python's built-in unit test framework === Bundled applications === The main Django distribution also bundles a number of applications in its "contrib" package, including: an extensible authentication system the dynamic administrative interface tools for generating RSS and Atom syndication feeds a "Sites" framework that allows one Django installation to run multiple websites, each with their own content and applications tools for generating Sitemaps built-in mitigation for cross-site request forgery, cross-site scripting, SQL injection, password cracking and other typical web attacks, most of them turned on by default a framework for creating geographic information system (GIS) applications === Extensibility === Django's configuration system allows third-party code to be plugged into a regular project, provided that it follows the reusable app conventions. More than 5000 packages are available to extend the framework's original behavior, providing solutions to issues the original tool didn't tackle: registration, search, API provision and consumption, CMS, etc. This extensibility is, however, mitigated by internal components' dependencies. While the Django philosophy implies loose coupling, the template filters and tags assume one engine implementation, and both the auth and admin bundled applications require the use of the internal ORM. None of these filters or bundled apps are mandatory to run a Django project, but reusable apps tend to depend on them, encouraging developers to keep using the official stack in order to benefit fully from the apps ecosystem. === Server arrangements === Django can be run on ASGI or WSGI-compliant web servers. Django officially supports five database backends: PostgreSQL, MySQL, MariaDB, SQLite, and Oracle. Microsoft SQL Server can be used with mssql-django. == Version history == The Django team will occasionally designate certain releases to be "long-term support" (LTS) releases. LTS releases will get security and data loss fixes applied for a guaranteed period of time, typically 3+ years, regardless of the pace of releases afterwards. == Community == === DjangoCon === There is a semiannual conference for Django developers and users, named "DjangoCon", that has been held since September 2008. DjangoCon is held annually in Europe, in May or June; while another is held in the United States in August or September, in various cities. ==== United States ==== The 2012 DjangoCon took place in Washington, D.C., from September 3 to 8. 2013 DjangoCon was held in Chicago at the Hyatt Regency Hotel and the post-conference Sprints were hosted at Digital Bootcamp, computer training center. The 2014 DjangoCon US returned to Portland, OR from August 30 to 6 September. The 2015 DjangoCon US was held in Austin, TX from September 6 to 11 at the AT&T Executive Center. The 2016 DjangoCon US was held in Philadelphia, PA at The Wharton School of the University of Pennsylvania from July 17 to 22. The 2017 DjangoCon US was held in Spokane, WA; in 2018 DjangoCon US was held in San Diego, CA. DjangoCon US 2019 was held again in San Diego, CA from September 22 to 27. DjangoCon 2021 took place virtually and in 2022, DjangoCon US returned to San Diego from October 16 to 21. DjangoCon US 2023 was held from October 16 to 20 at the Durham, NC convention center and DjangoCon US 2024 took place also in Durham in September 22 to 27. DjangoCon US 2025 was held from September 8 to 12 in Chicago, Illinois. ==== Europe ==== The 2025 edition of DjangoCon Europe took place in Dublin, Ireland from 23 to 27 April. In 2024, the conference was hosted in Vigo, Spain. Edinburgh, Scotland served as the venue for DjangoCon Europe in 2023. The 2022 conference was organized in Porto, Portugal. In 2021, DjangoCon Europe was held virtually due to the COVID-19 pandemic. The 2020 edition was also conducted as a fully virtual event. DjangoCon Europe 2019 was held in Copenhagen, Denmark. In 2018, the event took place in Heidelberg, Germany. The 2017 conference was convened in Florence, Italy. DjangoCon Europe 2012 was organized in Zurich, Switzerland. ==== Australia ==== Django mini-conferences are usually held every year as part of the Australian Python Conference 'PyCon AU'. Previously, these mini-conferences have been held in: Hobart, Australia, in July 2013, Brisbane, Australia, in August 2014 and 2015, Melbourne, Australia in August 2016 and 2017, and Sydney, Australia, in August 2018 and 2019. ==== Africa ==== The first DjangoCon Africa was held in Zanzibar, Tanzania, from 6 to 11 November 2023. The event hosted approximately 200 attendees from 22 countries, including 103 women. The conference featured 26 talks on topics such as software development, education, careers, accessibility, and agriculture, often highlighting perspectives from across the African continent. Future editions of the conference are planned, with details available on the official website === Community groups & programs === Django has spawned user groups and meetups around the world, a notable group is the Django Girls organization, which began in Poland but now has had events in 91 countries. Another initiative is Djangonaut Space, a mentorship program aimed at supporting new contributors to the Django ecosystem. The program pairs experienced mentors with developers to guide them through making meaningful contributions to Django and its community. It emphasizes long-term engagement, inclusion, and collaborative open-source development. == Ports to other languages == Programmers have ported Django's template engine design from Python to other languages, providing decent cross-platform support. Some of these options are more direct ports; others, though inspired by Django and retaining its concepts, take the liberty to deviate from Django's design: Liquid for Ruby Template::Swig for Perl Twig for PHP and JavaScript Jinja for Python ErlyDTL for Erlang == CMSs based on Django Framework == Django as a framework is capable of building a complete CMS

    Read more →
  • Distributed manufacturing

    Distributed manufacturing

    Distributed manufacturing, also known as distributed production, cloud producing, distributed digital manufacturing, and local manufacturing, is a form of decentralized manufacturing practiced by enterprises using a network of geographically dispersed manufacturing facilities that are coordinated using information technology. It can also refer to local manufacture via the historic cottage industry model, or manufacturing that takes place in the homes of consumers. == Enterprise == In enterprise environments, the primary attribute of distributed manufacturing is the ability to create value at geographically dispersed locations. For example, shipping costs could be minimized when products are built geographically close to their intended markets. Also, products manufactured in a number of small facilities distributed over a wide area can be customized with details adapted to individual or regional tastes. Manufacturing components in different physical locations and then managing the supply chain to bring them together for final assembly of a product is also considered a form of distributed manufacturing. Digital networks combined with additive manufacturing allow companies a decentralized and geographically independent distributed production (cloud manufacturing). == Consumer == Within the maker movement and DIY culture, small scale production by consumers often using peer-to-peer resources is being referred to as distributed manufacturing. Consumers download digital designs from an open design repository website like Youmagine or Thingiverse and produce a product for low costs through a distributed network of 3D printing services such as 3D Hubs, Geomiq. In the most distributed form of distributed manufacturing the consumer becomes a prosumer and manufacturers products at home with an open-source 3-D printer such as the RepRap. In 2013 a desktop 3-D printer could be economically justified as a personal product fabricator and the number of free and open hardware designs were growing exponentially. Today there are millions of open hardware product designs at hundreds of repositories and there is some evidence consumers are 3-D printing to save money. For example, 2017 case studies probed the quality of: (1) six common complex toys; (2) Lego blocks; and (3) the customizability of open source board games and found that all filaments analyzed saved the prosumer over 75% of the cost of commercially available true alternative toys and over 90% for recyclebot filament. Overall, these results indicate a single 3D printing repository, MyMiniFactory, is saving consumers well over $60 million/year in offset purchases of only toys. These 3-D printers can now be used to make sophisticated high-value products like scientific instruments. Similarly, a study in 2022 found that 81% of open source designs provided economic savings and the total savings for the 3D printing community is more than $35 million from downloading only the top 100 products at YouMagine. In general, the savings are largest when compared to conventional products when prosumers use recycled materials in 'distributed recycling and additive manufacturing' (DRAM). == Emergency Distributed Manufacturing During COVID-19 Pandemic == Distributed manufacturing became far more visible during the COVID-19 pandemic because it offered a practical response to the breakdown of centralized global supply chains. As lock downs, border restrictions, and factory shutdowns disrupted conventional production, decentralized networks using local facilities such as Open Source Medical Supplies stepped in and manufactured over 48 million products. Additive manufacturing /3D printing were used to produce urgently needed items such as face shields, ventilators and their components, nasopharyngeal swabs, and other personal protective equipment. This demonstrated that distributed manufacturing could reduce lead times, improve responsiveness, and lessen dependence on distant suppliers during crisis conditions for a wide range of products. Peer-reviewed studies on pandemic-era manufacturing note that additive manufacturing was especially valuable because digital design files could be shared rapidly and produced close to the point of need, enabling hospitals, universities, small firms, and maker communities to supplement strained medical supply chains. The pandemic also helped shift distributed manufacturing from being seen as a niche or experimental model to a credible strategy for resilience, flexibility, and emergency response. At the same time, scholars caution that its wider adoption depends on solving issues related to quality assurance, regulation, material consistency, and coordination across distributed production sites. Overall, COVID-19 popularized distributed manufacturing by showing that localized, digitally enabled production could complement traditional manufacturing systems when speed, adaptability, and supply-chain resilience were critical. == Social change == Some call attention to the conjunction of commons-based peer production with distributed manufacturing techniques. The self-reinforced fantasy of a system of eternal growth can be overcome with the development of economies of scope, and here, the civil society can play an important role contributing to the raising of the whole productive structure to a higher plateau of more sustainable and customised productivity. Further, it is true that many issues, problems and threats rise due to the large democratization of the means of production, and especially regarding the physical ones. For instance, the recyclability of advanced nanomaterials is still questioned; weapons manufacturing could become easier; not to mention the implications on counterfeiting and on "intellectual property". It might be maintained that in contrast to the industrial paradigm whose competitive dynamics were about economies of scale, commons-based peer production and distributed manufacturing could develop economies of scope. While the advantages of scale rest on cheap global transportation, the economies of scope share infrastructure costs (intangible and tangible productive resources), taking advantage of the capabilities of the fabrication tools. And following Neil Gershenfeld in that "some of the least developed parts of the world need some of the most advanced technologies", commons-based peer production and distributed manufacturing may offer the necessary tools for thinking globally but act locally in response to certain problems and needs. As well as supporting individual personal manufacturing social and economic benefits are expected to result from the development of local production economies. In particular, the humanitarian and development sector are becoming increasingly interested in how distributed manufacturing can overcome the supply chain challenges of last mile distribution. Further, distributed manufacturing has been proposed as a key element in the Cosmopolitan localism or cosmolocalism framework to reconfigure production by prioritizing socio-ecological well-being over corporate profits, over-production and excess consumption. == Technology == By localizing manufacturing, distributed manufacturing may enable a balance between two opposite extreme qualities in technology development: Low technology and High tech. This balance is understood as an inclusive middle, a "mid-tech", that may go beyond the two polarities, incorporating them into a higher synthesis. Thus, in such an approach, low-tech and high-tech stop being mutually exclusive. They instead become a dialectic totality. Mid-tech may be abbreviated to "both…and…" instead of "neither…nor…". Mid-tech combines the efficiency and versatility of digital/automated technology with low-tech's potential for autonomy and resilience. == Contracting in Distributed Manufacturing == Research into contracting and order processing models tailored for distributed manufacturing has highlighted the need for flexible, role-based frameworks and advanced digital tools. These tools and frameworks are essential for addressing issues related to quality assurance, payment structures, legal compliance, and coordination among multiple actors. By addressing these challenges, contracting models for distributed manufacturing can unlock its potential for more localized, efficient, and sustainable production systems. A system prototype has been developed to simplify contracting for distributed manufacturing. This tool allows buyers to manage orders across multiple manufacturers using a single interface, automating workflows to ensure clarity and accountability for everyone involved. This research was led by the Internet of Production, as part of the mAkE project (African European Maker Innovation Ecosystem), funded by the European Horizon 2020 research and innovation programme.

    Read more →
  • List & Label

    List & Label

    List & Label is a professional reporting tool for software developers. It provides comprehensive design, print and export functions. The software component runs on Microsoft Windows and can be implemented in desktop, cloud and web applications. List & Label can be used to create user-defined dashboards, lists, invoices, forms and labels. It supports many development environments, frameworks and programming languages such as Microsoft Visual Studio, Embarcadero RAD Studio, .NET Framework, .NET Core, ASP.NET, C++, Delphi, Java, C Sharp and some more. List & Label either retrieves data from various sources via data binding, or works database independent. Reports are designed and created in the so-called List & Label Designer and then exported into a multitude of formats like PDF, Excel, XHTML and RTF. Since version 27 a web report designer for ASP.NET MVC is available. == History == The product was first released in 1992 by combit. The current version is 30. A new major version of List & Label is released every fall, usually in October. Updates are available several times a year via Service Pack. == Features == === Report Designer === The Designer enables users to graphically layout the report. It offers report objects such as tables, charts, crosstabs, gauges, HTML, conditionally formatted text, barcodes, matrix codes, and graphics, and is extensible using third-party add-ons. User applications can interact with the report via the programmable object model of the report. The real-time preview functionality allows users to view changes instantly. Usability features include layer and appearance management, enabling conditional logic to dynamically control the visibility of objects in reports. The Designer also supports the inclusion of multiple report containers in a single project, accommodating complex layouts such as parallel tables and charts. A formula wizard and support for scripting languages such as C# facilitate advanced calculations and logic. The Designer's object model (DOM) provides developers with the ability to modify layouts and behaviors programmatically. === Web Report Designer === The web report designer works browser-based and independent from printer drivers and spoolers - that makes deployments to the cloud easier. Just like the use of the Visual Studio deployment pipeline. === Data Sources === Depending on the programming language, the product offers automatic support for data sources: Databases such as Microsoft SQL Server, Oracle, MySQL, PostgreSQL, IBM Db2, SQLite, MariaDB, MongoDB, Cosmos DB XML data, CSV Business objects Data sources that can be accessed via OLE DB, ODBC or ADO.NET LINQ data and data from web services GraphQL Additionally, the product offers support for unbound data and can be extended to support other data sources via interfaces. === Output Options === Printer Image Formats (JPEG, BMP, EMF, TIFF, PNG, SVG, HEIF, WebP) Document Formats: PDF, PDF/A, Word (DOCX), Excel (XLS), PowerPoint (PPTX) HTML, XHTML, MHTML Barcodes Plain Text, RTF, CSV, JSON XML, ZIP, Email, JSON List & Label preview file === Target Audience === List & Label can be used in Windows development environments. While it competes most notably on the Microsoft .NET platform with other products such as Crystal Reports, SQL Server Reporting Services, ActiveReports, there are few competing products for other programming languages (e.g. Progress, Alaska Xbase++, Visual DataFlex). == Awards == Reader's Choice Award 2005–2008 Stevie Awards 2021: Best Technology for Data Visualization Top 100 Publisher Award Component Source 2013-2014, 2014-2015,2016, 2018, 2019, 2020, 2021, 2022

    Read more →
  • Google Cloud Dataflow

    Google Cloud Dataflow

    Google Cloud Dataflow is a fully managed service for executing Apache Beam pipelines within the Google Cloud Platform ecosystem. Dataflow provides a fully managed service for executing Apache Beam pipelines, offering features like autoscaling, dynamic work rebalancing, and a managed execution environment. Dataflow is suitable for large-scale, continuous data processing jobs, and is one of the major components of Google's big data architecture on the Google Cloud Platform. At its core, Dataflow's architecture is designed to abstract away infrastructure management, allowing developers to focus purely on the logic of their data processing tasks. When a pipeline written using the Apache Beam SDK is submitted, Dataflow translates this high-level definition into an optimized job graph. The service then provisions and manages a fleet of Google Compute Engine workers to execute this graph in a highly parallelized and fault-tolerant manner. This serverless approach, combined with intelligent autoscaling of both the number of workers (horizontal) and the resources per worker (vertical), ensures that jobs have the precise amount of computational power needed at any given time, optimizing both performance and cost. The service's deep integration with the Google Cloud ecosystem makes it a powerful tool for a variety of use cases beyond simple data movement. For real-time analytics, Dataflow can ingest unbounded streams of data from Cloud Pub/Sub, perform complex transformations, and load results into BigQuery for immediate querying. In machine learning workflows, it is commonly used to preprocess and transform massive datasets stored in Cloud Storage, preparing them for training models in Vertex AI. This versatility makes it the central processing engine for modern ETL (Extract, Transform, Load) operations, streaming analytics, and large-scale data preparation within the cloud. == History == Google Cloud Dataflow was announced in June, 2014 and released to the general public as an open beta in April, 2015. In January, 2016 Google donated the underlying SDK, the implementation of a local runner, and a set of IOs (data connectors) to access Google Cloud Platform data services to the Apache Software Foundation. The donated code formed the original basis for Apache Beam. In August 2022, there was an incident where user timers were broken for certain Dataflow streaming pipelines in multiple regions, which was later resolved. Throughout 2023 and 2024, there have been various other updates and incidents affecting Google Cloud Dataflow, as documented in the release notes and service health history. The donation of the Dataflow SDK to the Apache Software Foundation was a pivotal moment, establishing Apache Beam as a unified, open-source programming model for defining both batch and streaming data pipelines. This strategic move decoupled the pipeline definition from the execution engine. As a result, developers could write portable data processing logic that was not locked into Google's ecosystem. A Beam pipeline can be executed on various runners, including Apache Flink, Apache Spark, and, of course, the highly optimized Google Cloud Dataflow service, providing flexibility and future-proofing data processing investments. == Features == Google Cloud Dataflow supports both batch and streaming data processing pipelines. It automatically handles resource provisioning, data sharding, and scaling according to workload, reducing manual configuration needed for large-scale data operations. == Use cases == Dataflow is used for ETL (Extract, Transform, Load) data pipelines, real-time analytics, and event stream processing for companies in industries such as finance, advertising, and IoT.

    Read more →
  • Toad Data Modeler

    Toad Data Modeler

    Toad Data Modeler is a database design tool allowing users to visually create, maintain, and document new or existing database systems, and to deploy changes to data structures across different platforms. It is used to construct logical and physical data models, compare and synchronize models, generate complex SQL/DDL, create and modify scripts, and reverse and forward engineer databases and data warehouse systems. Toad's data modelling software is used for database design, maintenance and documentation. == Product History == Toad Data Modeler was previously called "CASE Studio 2" before it was acquired from Charonware by Quest Software in 2006. Quest Software was acquired by Dell on September 28, 2012. On October 31, 2016, Dell finalized the sale of Dell Software to Francisco Partners and Elliott Management, which relaunched on November 1, 2016 as Quest Software. == Features/Usages == Multiple database support - Connect multiple databases natively and simultaneously, including Oracle, SAP, MySQL, SQL Server, PostgreSQL, Db2, Ingres, and Microsoft Access. Data modelling tool - Create database structures or make changes to existing models automatically and provide documentation on multiple platforms. Logical and physical modelling - Build complex logical and physical entity relationship models and reverse, forward, and engineer databases. Reporting - Generate detailed reports on existing database structures. Model customization - Add logical data to user diagrams to customize user models. All Toad products typically have 2 releases per year. == Other features == Model Actions (Compare Models, Convert Model, Merge Models, Generate Change Script) Version Control System (Apache Subversion) Naming Conventions Auto Layout Multiple Workspaces Scripting and Customization Automation Object Gallery Full Unicode Support Integration with Toad for Oracle == Related Software == Erwin Data Modeler Oracle SAP MySQL SQL Server PostgreSQL IBM Db2 Ingres Microsoft Access

    Read more →
  • Inductive probability

    Inductive probability

    Inductive probability attempts to give the probability of future events based on past events. It is the basis for inductive reasoning, and gives the mathematical basis for learning and the perception of patterns. It is a source of knowledge about the world. There are three sources of knowledge: inference, communication, and deduction. Communication relays information found using other methods. Deduction establishes new facts based on existing facts. Inference establishes new facts from data. Its basis is Bayes' theorem. Information describing the world is written in a language. For example, a simple mathematical language of propositions may be chosen. Sentences may be written down in this language as strings of characters. But in the computer it is possible to encode these sentences as strings of bits (1s and 0s). Then the language may be encoded so that the most commonly used sentences are the shortest. This internal language implicitly represents probabilities of statements. Occam's razor says the "simplest theory, consistent with the data is most likely to be correct". The "simplest theory" is interpreted as the representation of the theory written in this internal language. The theory with the shortest encoding in this internal language is most likely to be correct. == History == Probability and statistics was focused on probability distributions and tests of significance. Probability was formal, well defined, but limited in scope. In particular its application was limited to situations that could be defined as an experiment or trial, with a well defined population. Bayes's theorem is named after Rev. Thomas Bayes 1701–1761. Bayesian inference broadened the application of probability to many situations where a population was not well defined. But Bayes' theorem always depended on prior probabilities, to generate new probabilities. It was unclear where these prior probabilities should come from. Ray Solomonoff developed algorithmic probability which gave an explanation for what randomness is and how patterns in the data may be represented by computer programs, that give shorter representations of the data circa 1964. Chris Wallace and D. M. Boulton developed minimum message length circa 1968. Later Jorma Rissanen developed the minimum description length circa 1978. These methods allow information theory to be related to probability, in a way that can be compared to the application of Bayes' theorem, but which give a source and explanation for the role of prior probabilities. Marcus Hutter combined decision theory with the work of Ray Solomonoff and Andrey Kolmogorov to give a theory for the Pareto optimal behavior for an Intelligent agent, circa 1998. === Minimum description/message length === The program with the shortest length that matches the data is the most likely to predict future data. This is the thesis behind the minimum message length and minimum description length methods. At first sight Bayes' theorem appears different from the minimimum message/description length principle. At closer inspection it turns out to be the same. Bayes' theorem is about conditional probabilities, and states the probability that event B happens if firstly event A happens: P ( A ∧ B ) = P ( B ) ⋅ P ( A | B ) = P ( A ) ⋅ P ( B | A ) {\displaystyle P(A\land B)=P(B)\cdot P(A|B)=P(A)\cdot P(B|A)} becomes in terms of message length L, L ( A ∧ B ) = L ( B ) + L ( A | B ) = L ( A ) + L ( B | A ) . {\displaystyle L(A\land B)=L(B)+L(A|B)=L(A)+L(B|A).} This means that if all the information is given describing an event then the length of the information may be used to give the raw probability of the event. So if the information describing the occurrence of A is given, along with the information describing B given A, then all the information describing A and B has been given. ==== Overfitting ==== Overfitting occurs when the model matches the random noise and not the pattern in the data. For example, take the situation where a curve is fitted to a set of points. If a polynomial with many terms is fitted then it can more closely represent the data. Then the fit will be better, and the information needed to describe the deviations from the fitted curve will be smaller. Smaller information length means higher probability. However, the information needed to describe the curve must also be considered. The total information for a curve with many terms may be greater than for a curve with fewer terms, that has not as good a fit, but needs less information to describe the polynomial. === Inference based on program complexity === Solomonoff's theory of inductive inference is also inductive inference. A bit string x is observed. Then consider all programs that generate strings starting with x. Cast in the form of inductive inference, the programs are theories that imply the observation of the bit string x. The method used here to give probabilities for inductive inference is based on Solomonoff's theory of inductive inference. ==== Detecting patterns in the data ==== If all the bits are 1, then people infer that there is a bias in the coin and that it is more likely also that the next bit is 1 also. This is described as learning from, or detecting a pattern in the data. Such a pattern may be represented by a computer program. A short computer program may be written that produces a series of bits which are all 1. If the length of the program K is L ( K ) {\displaystyle L(K)} bits then its prior probability is, P ( K ) = 2 − L ( K ) {\displaystyle P(K)=2^{-L(K)}} The length of the shortest program that represents the string of bits is called the Kolmogorov complexity. Kolmogorov complexity is not computable. This is related to the halting problem. When searching for the shortest program some programs may go into an infinite loop. ==== Considering all theories ==== The Greek philosopher Epicurus is quoted as saying "If more than one theory is consistent with the observations, keep all theories". As in a crime novel all theories must be considered in determining the likely murderer, so with inductive probability all programs must be considered in determining the likely future bits arising from the stream of bits. Programs that are already longer than n have no predictive power. The raw (or prior) probability that the pattern of bits is random (has no pattern) is 2 − n {\displaystyle 2^{-n}} . Each program that produces the sequence of bits, but is shorter than the n is a theory/pattern about the bits with a probability of 2 − k {\displaystyle 2^{-k}} where k is the length of the program. The probability of receiving a sequence of bits y after receiving a series of bits x is then the conditional probability of receiving y given x, which is the probability of x with y appended, divided by the probability of x. ==== Universal priors ==== The programming language affects the predictions of the next bit in the string. The language acts as a prior probability. This is particularly a problem where the programming language codes for numbers and other data types. Intuitively we think that 0 and 1 are simple numbers, and that prime numbers are somehow more complex than numbers that may be composite. Using the Kolmogorov complexity gives an unbiased estimate (a universal prior) of the prior probability of a number. As a thought experiment an intelligent agent may be fitted with a data input device giving a series of numbers, after applying some transformation function to the raw numbers. Another agent might have the same input device with a different transformation function. The agents do not see or know about these transformation functions. Then there appears no rational basis for preferring one function over another. A universal prior insures that although two agents may have different initial probability distributions for the data input, the difference will be bounded by a constant. So universal priors do not eliminate an initial bias, but they reduce and limit it. Whenever we describe an event in a language, either using a natural language or other, the language has encoded in it our prior expectations. So some reliance on prior probabilities are inevitable. A problem arises where an intelligent agent's prior expectations interact with the environment to form a self reinforcing feed back loop. This is the problem of bias or prejudice. Universal priors reduce but do not eliminate this problem. === Universal artificial intelligence === The theory of universal artificial intelligence applies decision theory to inductive probabilities. The theory shows how the best actions to optimize a reward function may be chosen. The result is a theoretical model of intelligence. It is a fundamental theory of intelligence, which optimizes the agents behavior in, Exploring the environment; performing actions to get responses that broaden the agents knowledge. Competing or co-operating with another agent; games. Balancing short and long term rewards. In general no agent will always provi

    Read more →
  • Foveated rendering

    Foveated rendering

    Foveated rendering is a rendering technique which uses an eye tracker integrated with a virtual reality headset to reduce the rendering workload by greatly reducing the image quality in the peripheral vision (outside of the zone gazed by the fovea). A less sophisticated variant called fixed foveated rendering doesn't utilise eye tracking and instead assumes a fixed focal point. == History == Research into foveated rendering dates back at least to 1991. At Tech Crunch Disrupt SF 2014, Fove unveiled a headset featuring foveated rendering. This was followed by a successful kickstarter in May 2015. At CES 2016, SensoMotoric Instruments (SMI) demoed a new 250 Hz eye tracking system and a working foveated rendering solution. It resulted from a partnership with camera sensor manufacturer Omnivision who provided the camera hardware for the new system. In July 2016, Nvidia demonstrated during SIGGRAPH a new method of foveated rendering claimed to be invisible to users. In February 2017, Qualcomm announced their Snapdragon 835 Virtual Reality Development Kit (VRDK) which includes foveated rendering support called Adreno Foveation. == Use == According to chief scientist Michael Abrash at Oculus, utilising foveated rendering in conjunction with sparse rendering and deep learning image reconstruction has the potential to require an order of magnitude fewer pixels to be rendered in comparison to a full image. Later, these results have been demonstrated and published. In December 2019, fixed foveated rendering support was added to the Oculus Quest SDK. A number of VR headsets have included on-board eye tracking to provide support for foveated rendering, including HTC's Vive Pro Eye (2019), Meta Quest Pro (2022), PlayStation VR2 (2023), and Apple Vision Pro (2024). In 2025, Valve announced the upcoming Steam Frame headset, which applies a variation of the technique known as "foveated streaming" for wireless streaming from a PC to the headset; the method similarly uses variance in bit rate, and is performed at the encoder level rather than the software level.

    Read more →
  • List of publications in data science

    List of publications in data science

    This is a list of publications in data science, generally organized by order of use in a data analysis workflow. See the list of publications in statistics for more research-based and fundamental publications; while this list is more applied, business oriented, and cross-disciplinary. General article inclusion criteria are: Papers from notable practitioners or notable professors, either with a Wikipedia page or reference to their notability Common knowledge all data professionals should know, with references validating this claim Highly cited applied statistics and machine learning publications Discussion-facilitating papers on the field of data science as a whole (for example, the Attention Is All You Need paper is arguably a landmark paper that can be added here, but it is specific to generative artificial intelligence, not for all practitioners of data) Some reasons why a particular publication might be regarded as important: Topic creator – A publication that created a new topic Breakthrough – A publication that changed scientific knowledge significantly Influence – A publication which has significantly influenced the world or has had a massive impact on the teaching of data science. When possible, a reference is used to validate the inclusion of the publication in this list. == History == Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) Author: Leo Breiman Publication data: Online version: https://projecteuclid.org/journals/statistical-science/volume-16/issue-3/Statistical-Modeling--The-Two-Cultures-with-comments-and-a/10.1214/ss/1009213726.pdf Description: Describes two cultures of statistics, one using a parsimonious and generative stochastic model, while the other is an algorithmic model with no known mechanism for how the data is generated. Breiman argues that while statistics has traditionally favored using the stochastic model, there is value in expanding the methods that statisticians can use to study phenomenon. Importance: Influence on the philosophies of statisticians right before the increased use of machine learning and deep learning methods. In a 20-year retrospective on this article, "Breiman's words are perhaps more relevant than ever". Notable statisticians at the time wrote opinion pieces about the publication. Although overall critical of the publication, David Cox writes that the publication "contains enough truth and exposes enough weaknesses to be thought-provoking." Bradley Efron commented that this publication is a "stimulating paper". Emanuel Parzen also comments about this publication that "Breiman alerts us to systematic blunders (leading to wrong conclusions) that have been committed applying current statistical practice of data modeling". Data Scientist: The Sexiest Job of the 21st Century Author: Thomas H. Davenport and DJ Patil Publication data: Online version: hbr.org/2022/07/is-data-scientist-still-the-sexiest-job-of-the-21st-century Description: Describes the new role at companies that is coined "Data scientist", what they do, how an organization might recruit one to their organization, and how to work with one effectively. Importance: This publication has been an influence on the data community as mentioned near the time it was published in 2012 by institutions like IEEE Spectrum, but also mentioned nearly a decade later asking the same question the title poses. In a retrospective response to their own publication 10 years earlier, authors Davenport and Patil have reflected that the role of a data scientist has "become better institutionalized, the scope of the job has been redefined, the technology it relies on has made huge strides, and the importance of non-technical expertise, such as ethics and change management, has grown". 50 Years of Data Science Author: David Donoho Publication data: Online version: https://www.tandfonline.com/doi/full/10.1080/10618600.2017.1384734 Description: Retrospective discussion paper on the history and origins of data science, with a number of commentary from notable statisticians. Importance: This has been described as "the first in the field to present such a comprehensive and in-depth survey and overview", and helps to define the field that has many definitions. The Composable Data Management System Manifesto Author: Pedro Pedreira, Orri Erling, Konstantinos Karanasos, Scott Schneider, Wes McKinney, Satya R Valluri, Mohamed Zait, Jacques Nadeau Publication data: Online version: https://www.vldb.org/pvldb/vol16/p2679-pedreira.pdf Description: The vision paper advocating for a paradigm shift in how data management systems are designed using standard, composable, interoperable tools rather than siloed software tools. Importance: A paradigm shifting view on how future data science software tools should be designed for more efficient workflows, the principles of which "will be especially crucial for addressing fragmentation, improving interoperability, and promoting user-centricity as data ecosystems grow increasingly complex". == Data collection and organization == Tidy Data Author: Hadley Wickham Publication data: Online version: https://www.jstatsoft.org/article/view/v059i10/ https://vita.had.co.nz/papers/tidy-data.pdf Description: Describes a framework for data cleaning that is summarized in the quote, "each variable is a column, each observation is a row, and each type of observational unit is a table". This allows a standard data structure for which data analysis tools can be consistently built around. Importance: Cited over 1,500 times, this effort for tidy data has been described by David Donoho as having "more impact on today's practice of data analysis than many highly regarded theoretical statistics articles". In the context of data visualization, this publication is said to support "efficient exploration and prototyping because variables can be assigned different roles in the plot without modifying anything about the original dataset". Data Organization in Spreadsheets Author: Karl W. Broman and Kara H. Woo Publication data: Online version: https://www.tandfonline.com/doi/full/10.1080/00031305.2017.1375989 Description: This article offers practical recommendations for organizing data in spreadsheets, like Microsoft Excel and Google Sheets, to reduce errors and lower the barrier for later analyses due to limitations in spreadsheets or quirks in the software. Importance: Influences teaching both data and non-data practitioners to create more analysis-friendly spreadsheets, and has been described to outline "spreadsheet best practices". == Data visualizations == Quantitative Graphics in Statistics: A Brief History Author: James R. Beniger and Dorothy L. Robyn Publication data: Online version: https://www.jstor.org/stable/2683467 Description: Outlines history and evolution of quantitative graphics in statistics, going through spatial organization (17th and 18th centuries), discrete comparison (18th and 19th centuries), continuous distribution (19th century), and multivariate distribution and correlation (late 19th and 20th centuries). Importance: Helps put into perspective for learning data practitioners the recency of graphics that are used. A later publication "Graphical Methods in Statistics" by Stephen Fienberg in 1979 writes that his publication "owes much to the work of Beniger and Robyn". == Practice == Data Science for Business Author: Foster Provost and Tom Fawcett Publication data: Online version: N/A Description: Broadly outlines principles of data science and data-analytic thinking for businesses. Importance: Cited over 3,000 times, it is "highly recommended for students" but also it is also recommended due to its "relevance to senior management leaders who want to build and lead a team of data scientists and implement data science in solving complex business problems". == Tooling == Hidden Technical Debt in Machine Learning Systems Author: D. Sculley, Gary Holy, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-François Crespo, Dan Dennison Publication data: Online version: https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf Description: This paper argues that it is "dangerous to think of [complex machine learning] quick wins as coming for free" and overviews risk factors to account for when implementing a machine learning system. Importance: All authors worked for Google, article is cited over 2,000 times, and helped practitioners thinking about quickly implementing a machine learning tool without understanding the long-term maintenance of the tool. A few useful things to know about machine learning Author: Pedro Domingos Publication data: Online version: https://dl.acm.org/doi/10.1145/2347736.2347755 https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf Description: The purpose of this paper is to distill inaccessible "folk knowledge" to effectively implement machine learning projects because "machin

    Read more →