Indiana Elevation DataCap Application: A Detailed Analysis
Introduction
Guys, let's dive into an interesting DataCap application! We're talking about the Indiana Statewide Elevation Catalog, a treasure trove of LiDAR data managed by the Indiana Geographic Information Office and IOT Office of Technology. This data is super valuable for various projects and analyses, and the team is looking to store a significant portion of it on the Filecoin network. Let's explore the details of this application and see what it's all about.
Project Overview
Data Owner and Background
The Indiana Statewide Elevation Catalog is the data owner, responsible for managing a vast collection of digital LiDAR LAS files. This data, stored on AWS, dates back to the 2011-2013 collection and includes the NRCS-funded 2016-2020 collection. The goal is to make this data readily accessible for various applications. The State of Indiana Geographic Information Office and IOT Office of Technology manage a series of digital LiDAR LAS files stored in AWS, dating back to the 2011-2013 collection and including the NRCS-funded 2016-2020 collection. These LiDAR datasets are available as uncompressed LAS files, for cloud storage and access. Each year's data is organized into a tile grid scheme covering the entire geography of Indiana, ensuring easy access and efficient processing. The tiles' naming reflects each tile's lower left coordinate, facilitating accurate data management and retrieval. The AWS storage solution ensures that these extensive datasets are readily accessible for analysis and application across various projects. This initiative ensures that high-resolution elevation data is available for research, planning, and other critical applications, benefiting both the state of Indiana and the broader scientific community. The data's availability on the Filecoin network will further democratize access, allowing researchers and organizations worldwide to leverage this valuable resource for their projects. The meticulous organization and management of this data, combined with its accessibility and potential applications, highlight the significance of this DataCap application and its potential impact on the Filecoin network.
Data Specifications and Storage Requirements
The applicant is requesting a whopping 300 TiB of DataCap! The expected size of a single dataset copy is around 35 TiB, and they plan to store eight replicas for redundancy and accessibility. A weekly allocation of 100 TiB is requested to manage the data transfer efficiently. The on-chain address for the first allocation is f1n4zgam2zn56bgqbv4qiznlz32cpurhy7iuowhwa
. This large-scale storage request underscores the importance of the dataset and the commitment to making it available on the Filecoin network. The decision to store eight replicas highlights the emphasis on data durability and accessibility, ensuring that the dataset remains available even in the event of unforeseen circumstances. The weekly allocation plan of 100 TiB is designed to streamline the data transfer process, making it manageable and efficient. This comprehensive approach to data storage and management reflects the applicant's understanding of the Filecoin network's capabilities and their dedication to leveraging its decentralized storage solutions.
Data Type and Accessibility
This is a public, open dataset intended for research and non-profit use. This means anyone can access and use the data, which is awesome for promoting collaboration and innovation. The data is stored as uncompressed LAS files, ensuring high fidelity and accuracy. Making the data public and open aligns with the principles of open science and data democratization, enabling a wide range of researchers and organizations to benefit from this valuable resource. The choice of uncompressed LAS files underscores the commitment to data quality and integrity, ensuring that users have access to the highest resolution data possible. This decision will facilitate detailed analysis and modeling, supporting a diverse array of applications across various disciplines. The accessibility of this dataset on the Filecoin network will further enhance its impact, fostering collaboration and accelerating scientific discovery.
Data Details
Data Origin and Nature
The data originates from AWS Cloud storage and consists of digital LiDAR LAS files. These files contain detailed elevation information, crucial for various applications like environmental monitoring, urban planning, and disaster management. The State of Indiana Geographic Information Office and IOT Office of Technology manage a series of digital LiDAR LAS files stored in AWS, dating back to the 2011-2013 collection and including the NRCS-funded 2016-2020 collection. These LiDAR datasets are available as uncompressed LAS files, for cloud storage and access. Each year's data is organized into a tile grid scheme covering the entire geography of Indiana, ensuring easy access and efficient processing. The tiles' naming reflects each tile's lower left coordinate, facilitating accurate data management and retrieval. The AWS storage solution ensures that these extensive datasets are readily accessible for analysis and application across various projects. The LiDAR data's origin from a trusted source like AWS Cloud ensures its reliability and integrity. The use of LAS files as the storage format is a standard practice in the geospatial community, making the data readily compatible with existing software and workflows. This thoughtful approach to data management and accessibility positions the dataset as a valuable resource for a wide range of users and applications.
Data Preparation
The application mentions that the data preparer is located in India, but specific details on the data preparation process and tooling used are not provided. This could be an area for further clarification to ensure the data is properly formatted and ready for storage on Filecoin. Understanding the data preparation process is crucial for ensuring data quality and consistency. Knowing the tools and techniques used will help assess the reliability and accuracy of the dataset. Further information on this aspect of the application would provide additional confidence in the data's integrity and usability. It's essential to have a clear understanding of how the data is prepared to ensure it meets the necessary standards for long-term storage and retrieval on the Filecoin network.
Data Sample and Access
A sample of the data is available on AWS, accessible via the AWS CLI. This allows anyone to explore the data structure and content, which is a great way to ensure transparency and verify the data's suitability for different applications. The State of Indiana Elevation archive is stored in an S3 Bucket with the Amazon Resource Name (ARN) arn:aws:s3:::giselevationingov
. The AWS Region is us-east-2
, and the data can be accessed using the command aws s3 ls --no-sign-request s3://giselevationingov/
. This public accessibility is a significant advantage, enabling potential users to evaluate the data before committing to its use. The clear instructions provided for accessing the data sample demonstrate a commitment to transparency and user-friendliness. This accessibility will facilitate the data's adoption and utilization across various projects and research initiatives. Providing a readily accessible sample is a best practice in data sharing, fostering trust and encouraging engagement with the dataset.
Data Usage and Distribution
Retrieval Frequency and Storage Duration
The expected retrieval frequency for this data is yearly, and the plan is to keep the dataset stored on Filecoin for 1.5 to 2 years. This indicates a long-term commitment to data preservation and accessibility. The yearly retrieval frequency suggests that the data is intended for periodic analysis and updates, rather than continuous access. The planned storage duration of 1.5 to 2 years aligns with typical project lifecycles and ensures that the data remains available for future reference and research. This long-term storage strategy underscores the value of the dataset and the importance of its preservation for the community. The combination of retrieval frequency and storage duration provides a clear picture of the data's intended use and the applicant's commitment to its long-term availability.
Storage Deal Geographies and Distribution Methods
The applicant plans to make storage deals primarily in Greater China and other parts of Asia. The data will be distributed to storage providers via HTTP or FTP servers, hard drive shipping, and IPFS. This diverse distribution strategy ensures broad accessibility and redundancy. Focusing on geographies like Greater China and Asia reflects a strategic approach to data distribution, potentially catering to a significant user base in these regions. Utilizing multiple distribution methods, including HTTP/FTP servers, hard drives, and IPFS, enhances the data's accessibility and resilience. This comprehensive approach to data distribution demonstrates a commitment to ensuring that the data is readily available to a wide range of users, regardless of their technical infrastructure or geographical location. The combination of strategic geographic focus and diverse distribution methods maximizes the data's potential impact and reach.
Storage Provider Selection
Storage providers were found via Slack, and a list of specific provider IDs and locations is provided, including providers in Hong Kong and XinJiang. This transparency in provider selection is commendable. The use of Slack as a platform for finding storage providers highlights the importance of community engagement in the Filecoin ecosystem. Providing a list of specific provider IDs and locations adds transparency to the storage arrangement, enabling verification and accountability. This level of detail is crucial for building trust and confidence in the data storage process. The inclusion of providers from different geographic locations enhances the data's redundancy and accessibility, ensuring its availability to a global audience. The open and transparent approach to storage provider selection underscores the commitment to data integrity and accessibility.
Filecoin Guidelines and Deal Making
The applicant confirms that they will follow the Fil+ guidelines, which is crucial for ensuring the integrity and value of the network. The specific plans for making deals with storage providers are not detailed in the application, which could be an area for further discussion. Confirming adherence to Fil+ guidelines is a fundamental requirement for DataCap applications, demonstrating a commitment to the network's standards and principles. A clearer articulation of the deal-making strategy would provide additional insights into the applicant's approach to storage management and optimization. Understanding the deal-making process is essential for ensuring that the data is stored efficiently and cost-effectively. Further details on this aspect of the application would contribute to a more comprehensive assessment of its overall viability and impact.
Conclusion
Overall, this DataCap application for the Indiana Statewide Elevation Catalog is promising. The dataset is valuable, publicly accessible, and has clear use cases. However, additional details on data preparation and deal-making strategies would strengthen the application. I'm excited to see how this project can leverage the Filecoin network to make this important data even more accessible!
What is the Data Owner Name?
The Data Owner is the Indiana Statewide Elevation Catalog.
What Country/Region is the Data Owner from?
The Data Owner is from Afghanistan, which seems to be an error as the data is from Indiana, USA. This needs to be verified and corrected.
What Industry does the Data Owner belong to?
The Data Owner is in the Life Science / Healthcare industry.
What is the Website associated with the data?
The website is https://registry.opendata.aws/in-elevation/.
What Social Media Handle is associated with the data?
The social media handle is also https://registry.opendata.aws/in-elevation/, which seems to be a duplication or error. This needs clarification.
What Social Media Type is used?
Slack is the social media type.
What is the Data Preparer's role?
The role is Data Preparer.
What is the Total amount of DataCap being requested?
A total of 300TiB of DataCap is being requested.
What is the Expected size of a single dataset copy?
The expected size is 35TiB.
How many replicas are planned to be stored?
8 replicas are planned.
What is the Weekly allocation of DataCap requested?
The weekly allocation is 100TiB.
What is the On-chain address for the first allocation?
The address is f1n4zgam2zn56bgqbv4qiznlz32cpurhy7iuowhwa
.
What Data Type is the Application?
The application is for a Public, Open Dataset (Research/Non-Profit).
Is a Custom multisig being used?
No, a custom multisig is not being used.
Brief history of the project and organization?
The State of Indiana Geographic Information Office and IOT Office of Technology manage a series of digital LiDAR LAS files stored in AWS, dating back to the 2011-2013 collection and including the NRCS-funded 2016-2020 collection. These LiDAR datasets are available as uncompressed LAS files, for cloud storage and access. Each year's data is organized into a tile grid scheme covering the entire geography of Indiana, ensuring easy access and efficient processing. The tiles' naming reflects each tile's lower left coordinate, facilitating accurate data management and retrieval. The AWS storage solution ensures that these extensive datasets are readily accessible for analysis and application across various projects.
Is this project associated with other projects/ecosystem stakeholders?
No, this project is not associated with other projects/ecosystem stakeholders.
Description of the data being stored onto Filecoin?
The data consists of digital LiDAR LAS files managed by the State of Indiana Geographic Information Office and IOT Office of Technology, stored in AWS, dating back to the 2011-2013 collection and including the NRCS-funded 2016-2020 collection. These datasets are available as uncompressed LAS files, organized into a tile grid scheme covering the entire geography of Indiana, ensuring easy access and efficient processing.
Where was the data currently stored in this dataset sourced from?
The data is sourced from AWS Cloud.
What is the Data Preparer's location (Country/Region)?
The Data Preparer is located in India.
How will the data be prepared?
This information is missing in the application and needs clarification.
Has this dataset been stored on the Filecoin network before?
No, this dataset has not been stored on the Filecoin network before.
Please share a sample of the data.
Resources are available on AWS:
- Description: State of Indiana Elevation archive.
- Resource type: S3 Bucket
- Amazon Resource Name (ARN):
arn:aws:s3:::giselevationingov
- AWS Region:
us-east-2
- AWS CLI Access:
aws s3 ls --no-sign-request s3://giselevationingov/
Is this a public dataset that can be retrieved by anyone on the Network?
Yes, this is a public dataset.
What is the expected retrieval frequency for this data?
The expected retrieval frequency is Yearly.
For how long do you plan to keep this dataset stored on Filecoin?
The plan is to store the data for 1.5 to 2 years.
In which geographies do you plan on making storage deals?
The plan is to make storage deals in Greater China, Asia other than Greater China.
How will you be distributing your data to storage providers?
The data will be distributed via HTTP or FTP server, Shipping hard drives, IPFS.
How did you find your storage providers?
Storage providers were found via Slack.
Please list the provider IDs and location of the storage providers you will be working with.
The providers include:
f03601451
Hong Kongf03609158
Hong Kongf02826762
Hong Kongf02827135
Hong Kongf02827010
XinJiangf02825281
XinJiangf03623232
HongKongf03610683
HongKong
How do you plan to make deals to your storage providers?
This information is missing in the application and needs clarification.
Can you confirm that you will follow the Fil+ guideline?
Yes, the applicant confirms they will follow the Fil+ guideline.