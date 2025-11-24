Frequently Asked Questions (FAQ)

General Questions

Q: What makes VibeCodingEval different from traditional code evaluation platforms like HackerRank or LeetCode?

A: Unlike traditional platforms that focus on algorithmic correctness and test cases, this system evaluates complete project submissions across multiple dimensions including innovation, UX, scalability, and business viability. It processes not just code metadata, but also video presentations and documentation, providing a holistic assessment of a project's potential with explainability.

Q: How long does it take to evaluate a single submission?

A: On average, a complete evaluation takes approximately 180 seconds (3 minutes). This includes processing all three input types (text, video, code) and running all seven evaluation agents. The system can be massively parallelized, significantly reducing overall evaluation time for large batches. In the particular use case discussed above the average evaluation time turned out to be just 1.8 seconds.

Q: What's the cost of running evaluations at scale?

A: Based on our performance metrics, evaluating 30,000+ submissions costs approximately $6,340 in AI token usage. This translates to roughly $0.21 per submission – significantly more cost-effective than human evaluation which would require ~9 full-time employees for a year for the same volume.

Q: Can the system handle non-English submissions?

A: Currently, the system is optimized for English-language submissions. While the analysis works regardless of language, the text and video transcription components perform best with English content. Multi-language support is good as the underlying LLM is.

Technical Questions

Q: What are the maximum file size limits for submissions?

A: The system has the following limits:

Video files: Maximum 500MB (larger files may timeout during transcription)

Repository ZIP files: Maximum 100MB or 20,000 files

Document files: No strict limit, but very large PDFs (>50MB) may process slowly

Total submission package: Recommended under 1GB for optimal performance

Q: Can I use SQLite in production instead of PostgreSQL?

A: While technically possible, we strongly recommend PostgreSQL for production deployments. SQLite lacks the concurrent write capabilities needed for high-volume processing and doesn't scale well beyond a few thousand submissions. PostgreSQL provides better performance, reliability, and support for concurrent operations.

Q: How does the system handle repository dependencies and security scanning?

A: The current version extracts metadata and structure from repositories but doesn't execute code or install dependencies for security reasons. Security scanning can be added through custom agents (see the SecurityAuditAgent example in the blog), but it's not included in the default configuration.

Q: What happens if the Neuro SAN service is temporarily unavailable?

A: Neuro SAN service is run locally along with the evaluation system which reduces chances of failure. However, the system implements retry logic with exponential backoff. Failed evaluations are automatically retried up to 3 times with increasing delays. If the service remains unavailable, submissions are queued and can be reprocessed once the service is restored. The Celery task queue ensures no submissions are lost.

Operational Questions

Q: How do I monitor the evaluation pipeline in real-time? A: The system provides multiple monitoring options:

Flower Dashboard (http://localhost:5555): Real-time Celery task monitoring

Streamlit Dashboard (http://localhost:8501): Analytics and submission explorer

Database queries: Direct SQL queries for custom metrics

Log files: Detailed logs in the logs/processing/ directory

Q: Can I re-evaluate submissions with updated rubrics or agents? A: Yes! Use the --override flag when running evaluations:

python deploy/enqueue_eval_tasks.py --override --db-url postgresql://...

This will re-process all submissions with the current agent configurations. You can also target specific submissions using the --filter-source parameter.

Q: How do I scale the system for a 100,000+ submission event?

A: For massive scale:

Deploy multiple Celery workers across different machines Use Redis or Amazon ElastiCache for the message broker Implement database read replicas for the analytics dashboard Consider using S3 for submission storage instead of local files Increase the semaphore concurrency limit for AI requests Deploy evaluation agents on GPU-enabled instances for faster processing

Q: What's the best way to handle partial failures in batch processing?

A: The system is designed to be resilient:

Failed tasks are automatically retried

Successfully processed submissions are marked in the database

Use --range parameter to process specific batches

The system automatically skips already-processed submissions unless --override is specified

Monitor incomplete evaluations using: python eval_database.py --inc

Customization Questions

Q: How do I add a new evaluation dimension (e.g., security or performance)?

A: Adding new evaluation dimensions involves:

Create a new agent using the Neuro-San data-driven agent setup Add the new score field to the database schema Update the SCORE_FIELDS list in process_eval.py Modify the orchestration logic to include the new agent Update the dashboard to visualize the new dimension

Q: Can I use different multi-agent platform instead of the Neuro SAN?

A: Yes, the system is designed to be modular. You can replace the SimpleClient class with your own implementation that interfaces with OpenAI, Anthropic, CrewAI or any other multi-agent service. The key is maintaining the same response format (scores + rationale).

Q: How do I customize the scoring rubric for my specific use case?

A: Rubrics are defined in the agent prompts within the agent definitions. To customize:

Access your agent configurations Modify the prompt to include your specific criteria Adjust the scoring scale if needed (default is 1-100) Update the aggregation logic if you want weighted scores

Q: Can I integrate this with my existing CI/CD pipeline?

A: Absolutely! The system provides CLI interfaces that can be integrated into any pipeline:

# Example GitHub Actions integration

- name: Process Submissions

run: python process_inputs.py --input-source ${ { github.event.inputs.csv_path }}

- name: Run Evaluations

run: python deploy/enqueue_eval_tasks.py --filter-source new_submissions.csv

Performance and Optimization Questions

Q: How many concurrent evaluations can the system handle?

A: This depends on your infrastructure:

Default configuration: 8 concurrent AI requests (via semaphore)

Celery workers: Limited by CPU cores and memory

Database: PostgreSQL can handle hundreds of concurrent connections

Practical limit: 1000s of concurrent evaluations with proper resource allocation

Q: What's the most effective way to reduce evaluation costs?

A: Several strategies can reduce costs:

Implement caching for handling observability data Truncate video transcripts to essential portions Use smaller AI models for initial screening Implement smart sampling for very large repositories

Q: How do I debug a stuck evaluation?

A: Follow these steps:

Check Flower dashboard for task status Review logs in logs/processing/process_eval_*.log Query the database for incomplete evaluations: python eval_database.py --inc Check Redis queue length: redis-cli LLEN eval_queue Examine agent thinking files in logs/ directory Restart stuck workers if necessary

Data and Privacy Questions

Q: How is sensitive data handled in submissions? A: The system implements several privacy measures:

Submissions are processed in isolated environments

Temporary files are automatically cleaned up

Database connections use SSL in production

S3 storage uses encryption at rest

No code is executed, only metadata is analyzed statically

Q: Can I export evaluation results for further analysis?

A: Yes, multiple export options are available:

# Export to pandas DataFrame

df = db.get_all_evaluations_as_df()

df.to_csv('evaluations_export.csv')

# Direct SQL export

python eval_database.py --query "SELECT * FROM evaluations" > results.json

Q: How long is evaluation data retained?

A: By default, all evaluation data is retained indefinitely. You can implement data retention policies by: